In [1]:
import getpass
import psycopg2

from sqlalchemy import create_engine
from psycopg2.extensions import register_adapter, AsIs
from IPython.display import Image

Tindify: A Matchmaking Application that Uses Song Streaming Data¶

Even before the age of digitalization, people have been exploring different means of self-promotion in order to be able to find desired partners or friends that are outside of one’s network. As early as the 1600s, different bachelors have explored using newspapers as a way to look for potential suitors [1]. This practice has continued until the 1960s, where the first computer-assisted matching platforms were developed by Harvard students. Dubbed as Operation Match, this platform utilizes a 75 question survey for individuals who would like to find potential suitors and then utilizes an IBM 1401 computer to generate a list of potential matches for each individual in exchange for a fee. These initial efforts in utilizing technology to form connections between people has led to the evolution of the online dating space. In the early 21st century, matchmaking platforms would leverage the results of different psychological tests and then apply statistical methods such as factor analysis to identify the profiles of each user. The recommended matches generated by these methods would highly likely come from those that have similar profiles as the user. However, different researchers that take an interest in the mechanisms behind online dating have critiqued the possibility of generating inaccurate matches when using survey data in finding match recommendations due to prevalent discrepancies between the characteristics of the person as reflected on the survey results versus their actual personalities as they reveal themselves to the other party. Likewise, the responses of people on a survey may only be relevant at one particular point in time. Hence, this methodology does not capture the dynamic changes in the characteristics of individuals.

With this, players in the technology industry have explored the utilization of online dating applications which attempt to integrate information that are reflective of the user’s current characteristics - such as demographic information on social media along with user detected activity when navigating through the internet. The widespread use of social media which started in the late 2000s has led to the curation of different online dating platforms targeted towards specific kinds of demographics, including the members of the LGBTQ+ community along with other minority groups. Tinder, which was established in 2012, is a global online dating platform that aims to connect different people who have the potential to be romantic lovers or very close friends. This dating platform is often used by travelers who would like to meet locals in the country that they are traveling to. The platform tends to use collected user demographics, social media activity data, along with the user’s level of activity in Tinder to generate a compilation of profiles which may be of interest to the user. Once this list of potential matches was generated by the application, each user can either swipe right to express their interest towards meeting the potential match or swipe left if they are not interested. Tinder is the most downloaded dating platform in the entire world as of 2021, and is available in more than 190 countries and 40 languages.

Although Tinder can effectively generate recommendations to each user for potential matches, it is more likely that the relationships formed from existing online platforms do not last for a significant period of time. Given Tinder’s reputation of being a “hookup app”, there may be differences in the expectations of two different individuals meeting together for the first time in terms of what they are looking for in a relationship. Moreover, it is also likely that the individuals that meet through the app would have unmatched expectations due to differences between the personas defined by each user through the dating application in relation to their actual personalities in real life.

The team believes that using demographic information as the sole basis for making user recommendations could lead to friendships and relationships that are fragile as Tinder tends to only use information inputted by each user to create profiles. Although Tinder tends to use social media information for users that log in through OAuth, this information collected may actually be not useful given that it is easy for individuals to make alternative personas online. With this, the team believes that it is necessary for dating platforms to utilize information that reveals the characteristics of each individual user as this is indicative of their actual behaviors and personalities. Hence, our team of data engineers have made Tindify, a matchmaking application that uses song streaming information of each user (which is collected from Spotify) in order to match two individuals who have similar music tastes.

Design Considerations¶

In making the application, the team considered that the application should have a relatively similar design as Tinder in terms of system performance. The application should have a low latency so that when users log in to their own profiles, they are able to immediately see their recommended matches upon putting in their user credentials. Likewise, the system must be able to allow song listening logs (which are in JSON format) to be read into the database and to be integrated with CSV file formats which contain information regarding the characteristics of each song. The system must also be made to allow for existing users to listen to new songs and for new songs to be added in the existing database.

Similar to Tinder’s existing system design, it is okay for the user profiles to be not updated in real time[3] as it may take some time before a particular user would actually be shown to different potential matches. It must be noted though that this does not mean that the application would not be able to generate recommended matches instantly, rather new information regarding each user (in particular, song listening activity) need not to be integrated immediately in real time when making user recommended matches.

Lastly, Tindify must have the capability to generate recommendations which takes into account the distance preferences of each user. Since Tindify would like to encourage people to meet together and continue their conversations outside the dating application, the platform must not match together two different people who are geographically far from each other even though they have similar music listening preferences. The recommendation engine employed by the system must first filter candidate matches on the basis of the location criteria before looking at their similarity in music tastes.

Tindify Features, Specifications and Processes¶

Features:

  1. Allows users to listen to different songs
  2. Match users listening to similar songs

Specifications:

  1. Should have a system to add new songs
  2. Should be able to identify the exact location of the users
  3. Should be able to match users according to song preferences within a certain distance threshold between users

Processes:

  1. Songs update
  2. User matching through recommendation system
  3. Reporting and analysis

Architecture¶

The data architecture of Tindify is designed based on its features and specifications. Figure below shows the overview of the Data Architecture.

In [2]:
Image(filename='./fp_images/Archi.png') 
Out[2]:

Figure: Architectural Diagram of Tindify

Tindify is hosted on Amazon Web Services or AWS Cloud and its data architecture is composed of the following:

Data Lake : Amazon S3¶

As we envision Tindify to operate at scale comparable to Tinder and to handle data of different structures, it will need a repository design to store, process and secure large amounts of structured, semi structured and unstructured data. For this purpose, we design the data architecture to include a data lake hosted in an Amazon S3 bucket that will handle streaming data from the app.

Since it is expected that the Tindify app will generate large amounts of data during streaming, it is imperative to have a data architecture that consumes data immediately as it is generated. Hence, data lake is found to be the most suitable since it does not require data processing during ingestion. This ensures smooth and uninterrupted streams of events. Aside from the stream data, which is an unstructured data in JSON file format, song features data will also be ingested in the data lake. Song features data will be either structured or semi-structured. These data will be ingested to the data lake by batch, see Figure 2 for the screen shots of the file formats.

Tindify’s data lake is composed of the following zones:

  1. Gold - Storage for processed data. At the time, no data is stored here because synthetic data is being used.
  1. Landing - Raw data logs in the form of json strings collected from the streaming service are fed into this bucket. The raw data for the music features are also stored here.
  1. Work - copies of the working databases are stored here.
  1. Sensitive - For this service, user data that is considered sensitive and will be stored here. At the time, synthetic data used so no files are currently stored here.

  2. Recommender - Outputs generated from the recommender systems are stored here to be pulled by the API.

User Permissions

  1. Data Engineers - Landing Zone, Work Zone

  2. Data Steward - Sensitive Zone

  3. Data Scientists - Work Zone, Gold Zone, Recommender

  4. Business Analysts - Gold Zone

In [17]:
Image(filename='./fp_images/S31.png') 
Out[17]:
In [18]:
Image(filename='./fp_images/S32.png') 
Out[18]:
In [19]:
Image(filename='./fp_images/S33.png') 
Out[19]:
In [20]:
Image(filename='./fp_images/S34.png') 
Out[20]:
In [21]:
Image(filename='./fp_images/S35.png') 
Out[21]:
In [22]:
Image(filename='./fp_images/S36.png') 
Out[22]:
In [23]:
Image(filename='./fp_images/S37.png') 
Out[23]:
In [24]:
Image(filename='./fp_images/S38.png') 
Out[24]:
In [25]:
Image(filename='./fp_images/S39.png') 
Out[25]:
In [26]:
Image(filename='./fp_images/S310.png') 
Out[26]:
In [27]:
Image(filename='./fp_images/S311.png') 
Out[27]:
In [28]:
Image(filename='./fp_images/S312.png') 
Out[28]:

The data lake would also contain CSV files containing a list of the music characteristics of each song being listened to by every user. The CSV files contain song characteristics collected from the Spotify API, such as a song’s instrumentality, valence, danceability, etc. This information would be used to find the potential matches for a user. Seen in Figure 3 is a snippet of the CSV files stored in the S3 bucket.

In [37]:
Image(filename='./fp_images/files.png') 
Out[37]:

Figure: Raw Data File Formats of Files are grouped by time interval

In [36]:
Image(filename='./fp_images/logs.png') 
Out[36]:

Figure: Content of the files

In [50]:
Image(filename='./fp_images/music_feats.png') 
Out[50]:

Figure: Song Characteristics

No SQL Database : Amazon DynamoDB¶

Amazon DynamoDB is the NoSQL database used in this data architecture. Since we are dealing with streaming data, which is unstructured, NoSQL allows us to store them in huge amounts, giving more flexibility on handling them.

Schema

In [16]:
Image(filename='./fp_images/DDB1.png') 
Out[16]:

The Primary Key is composed of the userID attribute is used as the Partition Key while the timestamp attribute is used as the Sort Key. These Keys allow us to query the necessary information needed per batch and by user.

In [6]:
Image(filename='./fp_images/DDB2.png') 
Out[6]:
In [7]:
Image(filename='./fp_images/DDB3.png') 
Out[7]:
In [11]:
Image(filename='./fp_images/DDB4.png') 
Out[11]:
In [12]:
Image(filename='./fp_images/DDB5.png') 
Out[12]:
In [13]:
Image(filename='./fp_images/DDB6.png') 
Out[13]:
In [14]:
Image(filename='./fp_images/DDB7.png') 
Out[14]:
In [15]:
Image(filename='./fp_images/DDB8.png') 
Out[15]:

OLTP Database : Amazon Related Database Service (RDS)¶

Amazon RDS is the Online Transaction Processing (OLTP) database of the Tindify Architecture. This stores the following relevant information:

  1. User Profiles
  2. User Song Streaming Activity
  3. Song Features and Attributes
  4. Song Artist Information

The information stored here will be then used for data mining and business analytics, therefore, this database will be made accessible to relevant users in the organization to perform certain operations relevant to the business and can be also used for further processing.

The input data for the User Profile and User Song Streaming Activity comes from Amazon DynamoDB while the Song Features and Attributes and Song Artist Information input data comes from the data lake. They are transformed prior loading to the OLTP database.

In [33]:
Image(filename='./fp_images/RDS1.png') 
Out[33]:

Schema

The OLTP database is used in order to be able to integrate the different data sources used in generating matches for each user on the basis of their song preferences. The OLTP database would also be used as the main source for extracting the song feature vectors for each user looking for a match on Tindify. The OLTP database consists of five normalized tables, namely users, user locations, listens, artists, and songs. The songs table contains all of the characteristics of songs that are being listened to by each user, whereas the artists table contains the singers (could be multiple) for each song. The listens table contains the title of the song being listened to by the user and the timestamp in which the song is actually listened by the user. The users table would contain demographic information about each user on Tindify, along with the type of their subscription on Tindify. Lastly, each user tends to reside on a specific location which is represented by geospatial coordinates.

In [31]:
Image(filename='./fp_images/oltp_schema.png') 
Out[31]:

OLAP Database: Amazon Redshift¶

Amazon Redshift is the Online Analytical Processing (OLAP) database of the Tindify Data Architecture. The input data comes from the OLTP database (Amazon RDS) that is transformed prior loading.

Data analysis on users and songs are performed here which are then reported through the dashboard.

In [35]:
Image(filename='./fp_images/Redshift1.png') 
Out[35]:

Schema

The OLAP database above considers the process of a user listening to a particular song, such that the grain is represented by each song being listened to by a user. The fact table of the OLAP schema would contain the song that a user would listen to at a particular point in time, along with information related to each user such as the artist of the song that they are listening to and their current location. Linked to the fact table are five different dimensions - the song dimension which contains the different song features that are used in generating recommendations, the artist of the song being listened to by the user, the location of each user, some demographic information about the user, and the particular point in time in which the user listens to a given song. This OLAP database would be connected to Google Data Studio in order to generate data visualizations that provide Tindify an idea of the characteristics of the users that are looking for matches through the application.

In [34]:
Image(filename='./fp_images/olap_schema.png') 
Out[34]:

ETL Jobs¶

1. Stream Logs from S3 Data Lake to NoSQL Database (DynamoDB)

Raw user stream logs data are extracted from the data lake and are transformed to the format required . During transformation, the data is cleaned and formatted according to the format requirements of NoSQL. This ETL job will be done by batch every hour.

Schedule: Every hour

In [41]:
Image(filename='./fp_images/etl1.png') 
Out[41]:

Figure: User Stream Logs Data Flow to AmazonDynamoDB

2. Processed stream logs from NoSQL Database (DynamoDB) to OLTP Database (Amazon RDS)

The user stream logs data processed in Amazon DynamoDB will be transformed to suit the required OLTP schema (refer to Figure 6). This will be done by batch every hour. The normalized form of the data is needed for the Recommender System. This will be done every hour since recommendation of matches is not required to be real time. In addition, this will also be used for data analytics in OLAP, hence, does not require them to be in real time.

Schedule: Every hour

In [42]:
Image(filename='./fp_images/etl2.png') 
Out[42]:

Figure: Processed User Stream Logs Data Flow to Amazon RDS

3. Song Features from S3 Data Lake to to OLTP Database (Amazon RDS)

Song Features data are extracted from the data lake and are transformed to the format required by the schema (refer to Figure 6). This will be done by batch every hour. The normalized form of the data is needed for the Recommender System. This will be done every hour since recommendation of matches is not required to be real time. In addition, this will also be used for data analytics in OLAP, hence, does not require them to be in real time.

Schedule: Every hour

In [44]:
Image(filename='./fp_images/etl3.png') 
Out[44]:

4. Processed stream logs from OLTP Database (Amazon RDS) and preprocessed song features from S3 Data Lake to OLAP Database (Amazon Redshift)

Processed user and song information data will be extracted from the OLTP database and then transformed to suit the required schema , refer to Figure 7. This will be done by batch daily. The normalized form of the data will be used in the OLAP database to perform the data analytics as required by the business and reporting of the data analytics through the dashboard is done daily, hence real time processing will not be required.

In [45]:
Image(filename='./fp_images/etl4.png') 
Out[45]:

Recommender System¶

Location Matching

For matching based on location, the Haversine algorithm [4] was used. It calculates the shortest distance between two points using the following formula.

In [48]:
Image(filename='./fp_images/haversine.png') 
Out[48]:

The input data comes from the user location table in the RDS, which are vectorized and broadcasted to the formula for the pairwise comparison of the user’s location relative to the location of all other registered users. Individuals that are not within the distance threshold that was inputted by the user would not be included anymore on the list of potential candidates for that specific user’s matches. Their similarity of song preferences would not be calculated anymore.

Song Matching

Once a list of candidates for potential candidates have been generated, the top three users in the database with the most similar song preferences relative to the one looking for potential matches would be identified by employing an information retrieval based recommender system. The cosine similarity metric would be used in order to identify the top three matches that have the most similar song feature vectors relative to the characteristics of the songs that are being listened to by the user. For each user, the average value of the song features (e.g. valence, instrumentality, etc.) would be calculated and the resulting vector for each user would be used for similarity calculation.

REST API Gateway¶

The recommendation results in CSV format are pushed into the REST API Gateway. That app then pulls the data from the API to show the match results to the users.

In [49]:
Image(filename='./fp_images/APIGateway.png') 
Out[49]:

Application¶

Tindify App will provide users his/her matches based on their song preferences and proximity with other users. The user will just choose his/her preferred distance threshold. The proof-of-concept app was created using tkinter, which simply filters the specific user information from the generated recommender matrices.

In [46]:
Image(filename='./fp_images/app.png') 
Out[46]:

Tindify Dashboard¶

To allow Tindify’s data scientists to have an understanding regarding the usage of the application along with the characteristics of the users that utilize the matching platform, a dashboard containing the different data visualizations would be generated from the collected user and song streaming data. After generating an OLAP database via Amazon Redshift, the resulting data warehouse was connected to Google Data Studio in order to generate summary statistics containing the most frequently listened song per day, along with the gender distribution of users in the application and the number of songs that each user tends to listen at a particular point in time. The dashboard is updated with new data by batch. (Refer to Figure 18 for the Sample of the Dashboard)

In [47]:
Image(filename='./fp_images/dashboard.png') 
Out[47]:

Implementation¶

ETL operations were implemented through a single DAG called fp_dag in Apache Airflow. The DAG is comprised of 7 tasks, with the last task just a confirmation of a run of the graph. Each task is divided into its own .py file and is imported into the main fp_dag.py file for the actual run of each file. The code behind the implementation heavily relied on the use of Boto3 and Psycopg2.

In [51]:
Image(filename='./fp_images/AF1.png') 
Out[51]:
In [52]:
Image(filename='./fp_images/AF2.png') 
Out[52]:
In [53]:
Image(filename='./fp_images/AF3.png') 
Out[53]:

Code for fp_dag.py¶

In [ ]:
import datetime
from update_ddb import update_ddb
from update_oltp import update_oltp
from recsys_music import recsys_music
from recsys_locs import recsys_locs
from user_map import user_map
from update_olap import update_olap

from airflow import DAG
from airflow.decorators import task
from airflow.operators.python import PythonOperator

with DAG(
    'fp_dag',
    start_date=datetime.datetime(2022, 6, 18),
    schedule_interval='@hourly'
    
) as dag:
    
    update_ddb = PythonOperator(
        task_id='update_ddb',
        python_callable= update_ddb
    )
    
    update_oltp = PythonOperator(
        task_id='update_oltp',
        python_callable= update_oltp
    )
    
    recsys_music = PythonOperator(
        task_id='recsys_music',
        python_callable= recsys_music
    )
    
    recsys_locs = PythonOperator(
        task_id='recsys_locs',
        python_callable= recsys_locs
    )
        
    user_map = PythonOperator(
        task_id='user_map',
        python_callable= user_map
    )
    
    update_olap = PythonOperator(
        task_id='update_olap',
        python_callable= update_olap
    )
    
    @task
    def dag_status():
        print('Success')
    
    update_ddb >> update_oltp >> [recsys_music, recsys_locs, user_map, update_olap] >> dag_status()

Task 1: update_ddb.py - The first task pulls the raw .json logs from the S3 bucket and transforms the files to be uploaded into DynamoDB then saves the data into the datalake as json files

In [54]:
import glob
import json
import pandas as pd
import boto3
from dynamodb_json import json_util

def update_ddb():

    bucket_name = 'finalproj-s3'
    _PREFIX = 'landing/logs'

    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    s3_client = boto3.client('s3')

    dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
    logs = dynamodb.Table('fp-logs')
    ddb_client = boto3.client('dynamodb')

    for s3_object in bucket.objects.filter(Prefix=_PREFIX).all():
        if s3_object.key == _PREFIX:
            continue
        else:
            content_object = s3.Object(bucket_name, s3_object.key)
            file_content = content_object.get()['Body']
            json_list = [json.loads(line) for line in file_content.iter_lines()]
            json_df = pd.DataFrame(json_list)

            for col in json_df.columns:
                try:
                    json_df[col] = json_df[col].str.replace('\'','`')
                except:
                    continue
            json_dict = json_df.to_dict('records')

            for row in json_dict:
                try:
                    ddb_client.execute_statement(
                    Statement=f"""
                    INSERT INTO "fp-logs" VALUE {str(row)}
                    """
                    )
                except:
                    continue

    ddb_output = ddb_client.execute_statement(
    Statement="""
    SELECT *
    FROM "fp-logs"
    """
    )['Items']

    s3object = s3.Object(bucket_name, 'work/nosql/ddb.json')
    s3object.put(Body=(bytes(json.dumps(ddb_output).encode('UTF-8'))))

Task 2: update_oltp - This taks pulls the data from the DynamoDB table and performs the ETL process into the Postgresql database then saves the data into the datalake as csv files

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ast import literal_eval

import boto3
import json
import itertools
import getpass
from dynamodb_json import json_util
import io

import psycopg2
from sqlalchemy import create_engine
from psycopg2.extensions import register_adapter, AsIs

def update_oltp():
    
    dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
    logs = dynamodb.Table('fp-logs')
    ddb_client = boto3.client('dynamodb')

    ddb_output = ddb_client.execute_statement(
    Statement="""
    SELECT *
    FROM "fp-logs"
    """
    )['Items']

    ddb_df = pd.DataFrame(json_util.loads(ddb_output))

    users_df = ddb_df[['userID', 'userFullName', 'gender', 'level']].copy()
    users_df = users_df.drop_duplicates(subset='userID', keep='last')
    users_df = users_df.sort_values(by='userID').reset_index(drop=True)

    locs_df = ddb_df[['userID', 'longitude', 'latitude']].copy()
    locs_df = locs_df.drop_duplicates(subset='userID', keep='last')
    locs_df = locs_df.sort_values(by='userID').reset_index(drop=True)

    songs = pd.read_csv('s3://finalproj-s3/landing/music/songs.csv')
    songs['artists'] = songs['artists'].apply(literal_eval)
    songs = songs.explode('artists')
    artists_df = songs[['artists']].dropna().drop_duplicates().reset_index(drop=True).reset_index().copy()
    artists_df = artists_df.rename(columns={'index':'artist_id'})

    songs_df = songs[['id', 'name', 'release_date', 'year', 
                       'duration_ms', 'acousticness', 'danceability',
                       'energy', 'explicit', 'instrumentalness', 'key',
                       'liveness', 'loudness', 'mode', 'popularity', 'speechiness',
                       'tempo', 'valence']].drop_duplicates()
    songs_df = songs_df.rename(columns={'name':'song_title'})

    songs_artists_df = songs[['name', 'artists']].reset_index(drop=True)
    songs_artists_df = songs_artists_df.rename(columns={'name':'song_title'})

    listens_df = ddb_df[['timestamp', 'userID', 'song_title']].copy()
    listens_df = listens_df.sort_values(by='timestamp').reset_index(drop=True)

    with open('creds.txt') as f:
        creds = f.read()
    
    engine = psycopg2.connect(
        database = 'love_database',
        host = "finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com",
        user = "love_keeper",
        password = creds
    )

    cur = engine.cursor()

    engine.set_session(autocommit=True)

    users_insert = ("""
    INSERT INTO users (
    userID,
    userFullName,
    gender,
    level
    )

    VALUES (%s, %s, %s, %s);
    """)

    for i, row in users_df.iterrows():
        try:
            cur.execute(users_insert, row)
            engine.commit()
        except:
            continue

    artists_insert = ("""
    INSERT INTO artists (
    artist_id,
    artists
    )

    VALUES (%s, %s);

    """)

    for i, row in artists_df.iterrows():
        try:
            cur.execute(artists_insert, row)
            engine.commit()
        except:
            continue

    songs_insert = ("""
    INSERT INTO songs (song_id, song_title, release_date, year, duration_ms,
    acousticness, danceability, energy, explicit, instrumentalness, key,
    liveness, loudness, mode, popularity, speechiness, tempo, valence)

    VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, 
             %s, %s, %s, %s, %s, %s, %s, %s);

    """)

    for i, row in songs_df.iterrows():
        try:
            cur.execute(songs_insert, tuple(row))
            engine.commit()
        except:
            continue

    listens_insert = ("""
    INSERT INTO listens (
    timestamp,
    userID,
    song_title
    )

    VALUES (%s, %s, %s);
    """)

    for i, row in listens_df.iterrows():
        try:
            cur.execute(listens_insert, row)
            engine.commit()
        except:
            continue

    locations_insert = ("""
    INSERT INTO locations (
    userID,
    longitude,
    latitude
    )

    VALUES (%s, %s, %s);
    """)

    for i, row in locs_df.iterrows():
        try:
            cur.execute(locations_insert, row)
            engine.commit()
        except:
            continue

    song_artist_insert = ("""
    INSERT INTO songs_artists (
    song_title,
    artists
    )

    VALUES (%s, %s);
    """)

    for i, row in songs_artists_df.iterrows():
        try:
            cur.execute(song_artist_insert, row)
            engine.commit()
        except:
            continue

    bucket_name = 'finalproj-s3'
    path = 'finalproj-s3/work/oltp'

    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    s3_client = boto3.client('s3')

    tables = ['artists', 'listens', 'locations', 'songs', 'users']

    for table in tables:
        try:
            query = f"""COPY {table} TO STDIN WITH (
            FORMAT csv, 
            DELIMITER ',', 
            QUOTE '"', 
            HEADER TRUE
            )"""

            file = io.StringIO()
            cur.copy_expert(query, file)
            s3_client.put_object(Bucket=bucket_name, Key=f'work/oltp/{table}.csv', Body=file.getvalue())
        except:
            continue

Task 3: update_olap - This taks pulls the data from the RDS tables and performs the ETL process into the Redshift database while saving the data into the datalake as csv files

In [ ]:
import pandas as pd
import psycopg2
from datetime import datetime

def update_olap():

    with open('creds.txt') as f:
        creds = f.read()
    
    engine = psycopg2.connect(
        database = 'love_database',
        host="finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com",
        user="love_keeper",
        password=creds
    )
    cur = engine.cursor()
    engine.set_session(autocommit=True)
    
    query = """
    SELECT *
    FROM users
    """
    users_df = pd.read_sql(query, engine)
    users_df = users_df.reset_index().rename(columns={'index': 'id'})
    users_df['id'] = users_df['id'].astype(int)
    
    query = """
    SELECT *
    FROM artists
    """
    artists_df = pd.read_sql(query, engine)
    
    query = """
    SELECT *
    FROM songs
    """
    songs_df = pd.read_sql(query, engine)

    songs_df['month'] = songs_df['release_date'].apply(lambda x: x.month)
    songs_df['day'] = songs_df['release_date'].apply(lambda x: x.day)

    songs_df = songs_df[['song_id', 'song_title', 'year', 'month', 'day', 
              'duration_ms', 'acousticness', 'danceability', 'energy', 
              'explicit', 'instrumentalness', 'key', 'liveness', 'loudness',
              'mode', 'popularity', 'speechiness', 'tempo', 'valence']]    
    
    query = """
    SELECT *
    FROM listens
    """
    listens_df = pd.read_sql(query, engine)
    listens_df['userid'] = listens_df['userid'].astype(int)

    listens_df['timestamp'] = listens_df['timestamp'].astype(int).apply(datetime.fromtimestamp)
    listens_df['year'] = listens_df['timestamp'].apply(lambda x: x.year)
    listens_df['month'] = listens_df['timestamp'].apply(lambda x: x.month)
    listens_df['day'] = listens_df['timestamp'].apply(lambda x: x.day)
    listens_df['hour'] = listens_df['timestamp'].apply(lambda x: x.hour)
    listens_df['minute'] = listens_df['timestamp'].apply(lambda x: x.minute)
    listens_df['second'] = listens_df['timestamp'].apply(lambda x: x.second)    
        
    query = """
    SELECT *
    FROM locations
    """
    loc_df = pd.read_sql(query, engine)
    loc_df['userid'] = loc_df['userid'].astype(float).astype(int)
    
    with open('creds2.txt') as f:
        creds2 = f.read()    
    
    engine_2 = psycopg2.connect(
    database="songsdb",
    host="new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com",
    user="songmanager",
    password=creds2,
    port="5439"
    )
    cur_2 = engine_2.cursor()
    engine_2.set_session(autocommit=True)
    
    users_insert = ("""
    INSERT INTO dim_users (id, 
    user_id, 
    full_name, 
    gender, 
    level
    )  
    VALUES (%s, %s, %s, %s, %s);
    """)
    for i, row in users_df.iterrows():
        try:
            cur_2.execute(users_insert, tuple(row.values))
            engine_2.commit()
        except:
            continue
    
    locations_insert = ("""
    INSERT INTO dim_locations ( 
    user_id, 
    longitude, 
    latitude
    )  
    VALUES (%s, %s, %s);
    """)
    for i, row in loc_df.iterrows():
        try:
            cur_2.execute(locations_insert, tuple(row.values))
            engine_2.commit()    
        except:
            continue

    artists_insert = ("""
    INSERT INTO dim_artists ( 
    artist_id,
    artist
    )  
    VALUES (%s, %s);
    """)

    for i, row in artists_df.iterrows():
        try:
            cur_2.execute(artists_insert, tuple(row.values))
            engine_2.commit()
        except:
            continue
            
    songs_insert = ("""
    INSERT INTO dim_songs ( 
    song_id, song_title, release_date,
    year, duration_ms, acousticness,
        danceability,
        energy,
        explicit,
        instrumentalness,
        key,
        liveness,
        loudness,
        mode,
        popularity,
        speechiness,
        tempo,
        valence
    )  
    VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s,
            %s, %s, %s, %s, %s, %s, %s, %s, %s);
    """)

    for i, row in songs_df.iterrows():
        try:
            cur_2.execute(songs_insert, tuple(row.values))
            engine_2.commit()
        except:
            continue
            
    listens_insert = ("""
    INSERT INTO fact_listens ( 
    user_id,
    song_id,
    song_title,
    timestamp
    ) VALUES (%s, %s, %s, %s); 
    """) 

    listens_merged_df = (pd.merge(listens_df, songs_df[['song_id', 'song_title']], 
                                 left_on='song_title', 
                                  right_on='song_title'))
    listens_merged_df = listens_merged_df[['userid', 'song_id', 'song_title', 'year', 'month', 'day',
                                           'hour', 'minute', 'second']]

    listens_merged_df['date_id'] = list(range(1, listens_merged_df.shape[0]+1))
    listens_merged_df_1 = listens_merged_df[['userid', 'song_id', 'song_title', 'date_id']]
    listens_merged_df_2 = listens_merged_df[['date_id', 'year', 'month', 'day',
                                             'hour', 'minute', 'second']]
    
    dates_insert = ("""
    INSERT INTO dim_dates (
    date_id,
    year,
    month,
    day,
    hour, 
    minute,
    second
    ) VALUES (%s, %s, %s, %s, %s, %s, %s)
    """)


    for i, row in listens_merged_df_2.astype(float).iterrows():
        try:
            cur_2.execute(dates_insert, tuple(row.values))
            engine_2.commit()  
            
        except:
            continue
    
    listens_insert = ("""
    INSERT INTO fact_listens ( 
    user_id,
    song_id,
    song_title,
    date_id
    ) VALUES (%s, %s, %s, %s); 
    """) 

    for i, row in listens_merged_df_1.iterrows():
        try:
            cur_2.execute(listens_insert, tuple(row.values))
            engine_2.commit()    
            
        except:
            continue
            
    dim_users = pd.read_sql('SELECT * FROM dim_users', engine_2)
    dim_locations = pd.read_sql('SELECT * FROM dim_locations', engine_2)    
    dim_artists = pd.read_sql('SELECT * FROM dim_artists', engine_2)       
    dim_songs = pd.read_sql('SELECT * FROM dim_songs', engine_2)     
    dim_dates = pd.read_sql('SELECT * FROM dim_dates', engine_2)       
    fact_listens = pd.read_sql('SELECT * FROM fact_listens', engine_2)      
    
    dim_users.to_csv('s3://finalproj-s3/work/olap/dim_users.csv')    
    dim_locations.to_csv('s3://finalproj-s3/work/olap/dim_locations.csv')  
    dim_artists.to_csv('s3://finalproj-s3/work/olap/dim_artists.csv')
    dim_songs.to_csv('s3://finalproj-s3/work/olap/dim_songs.csv')    
    dim_dates.to_csv('s3://finalproj-s3/work/olap/dim_dates.csv')    
    fact_listens.to_csv('s3://finalproj-s3/work/olap/fact_listens.csv')  

Task 4: recsys_locs - Pulls data from the OLTP database to generate the pairwise distances for each user. Tthe recommendation matrix is then saved as a csv file in the data lake.

In [ ]:
import psycopg2
import numpy as np
import pandas as pd
from scipy.spatial import distance

def recsys_locs():

    with open('creds.txt') as f:
        creds = f.read()   
    
    engine = psycopg2.connect(
    database = 'love_database',
    host = "finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com",
    user = "love_keeper",
    password = creds
    )

    cur = engine.cursor()

    engine.set_session(autocommit=True)

    query = """
    SELECT *
    FROM locations
    """
    loc_df = pd.read_sql(query, engine)

    loc_df = loc_df.drop_duplicates('userid', keep='last')

    lats = np.array(np.deg2rad(loc_df['latitude'].astype(float)))
    lons = np.array(np.deg2rad(loc_df['longitude'].astype(float)))
    diff_lat = lats[:,None] - lats
    diff_lon = lons[:,None] - lons
    d = np.sin(diff_lat/2)**2 + np.cos(lats[:,None])*np.cos(lats) * np.sin(diff_lon/2)**2
    d =  2 * 6371 * np.arcsin(np.sqrt(d))

    user_list = [uid for uid in loc_df['userid']]
    dist_df = pd.DataFrame(d)
    dist_df.index = user_list
    dist_df.columns = user_list

    dist_df.to_csv('s3://finalproj-s3/recommender/loc_match.csv', index_label='userID')

Task 5: recsys_locs - Pulls data from the OLTP database to generate the cosine similarity for the listens from each user. The recommendation matrix is then saved as a csv file in the data lake.

In [ ]:
import psycopg2
import numpy as np
import pandas as pd
from scipy.spatial import distance

def recsys_music():
    
    def generate_similarity_matrix(df_users):
        df_sim = pd.DataFrame()
        objects = df_users.iloc[:, 1:]

        for user_to_rec in range(len(objects)):
            query = df_users[df_users.index == user_to_rec].iloc[:, 1:]
            result_dist = distance.cdist(query, objects, metric='cosine')[0]
            df_sim[user_to_rec] = result_dist

        return df_sim

    with open('creds.txt') as f:
        creds = f.read()  
    
    engine = psycopg2.connect(
        database = 'love_database',
        host = "finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com",
        user = "love_keeper",
        password = creds
    )

    cur = engine.cursor()

    engine.set_session(autocommit=True)

    query = """
    SELECT *
    FROM songs
    """
    songs_df = pd.read_sql(query, engine)
    
    query = """
    SELECT *
    FROM listens
    """
    listens_df = pd.read_sql(query, engine)
    
    query = """
    SELECT *
    FROM users
    """
    users_df = pd.read_sql(query, engine)
    
    songs = songs_df.drop(['song_id', 'release_date'],axis=1)
    listens = listens_df[['userid', 'song_title']].drop_duplicates()
    match_df = pd.merge(listens, songs, left_on='song_title', right_on='song_title').drop(['song_title'],axis=1)
    
    song_matches = generate_similarity_matrix(match_df)
    
    user_list = [u for u in match_df['userid']]
    song_matches.index=user_list
    song_matches.columns=user_list
    
    song_matches.iloc[:-300,:-300].to_csv('s3://finalproj-s3/recommender/song_match.csv', index_label='userID')

Task 6: user_map - Pulls data from the OLTP database to create a mappings for userIDs and usernames

In [ ]:
import psycopg2
import numpy as np
import pandas as pd

def user_map():
    
    with open('creds.txt') as f:
        creds = f.read()      
    
    engine = psycopg2.connect(
    database = 'love_database',
    host = "finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com",
    user = "love_keeper",
    password = creds
    )

    cur = engine.cursor()

    engine.set_session(autocommit=True)
    
    query = """
    SELECT *
    FROM users
    """
    users_df = pd.read_sql(query, engine)
    
    users = users_df[['userid', 'userfullname']].copy()
    
    users.to_csv('s3://finalproj-s3/recommender/user_map.csv', index=False)

Task 7: dag_status - simply prints "Success" upon completion of the DAG

Conclusion¶

In the Tindify project, we are able to create an application that aims to match two individuals together on the basis of their proximity in location along with their similarity in the songs that they listen to. In generating the recommendations, we extracted the user information along with their respective locations via the OLTP database in order to filter out potential matches that go beyond the distance threshold defined by the user looking for matches. Those users that meet the distance criteria are then checked for compatibility by comparing their music preferences using the information-retrieval based recommender system. An OLAP database was also made for Tindify’s data scientists to monitor emerging trends in song preferences and also gain information about the characteristics of the users looking for potential matches through the application. Given that the user streaming data are in JSON log files and the song features are stored in a CSV format, a data lake was used in order to store the data.

Recommendations¶

To further improve the platform, we recommend the integration of different kinds of data that reveal user preferences instead of using information that the user just supplies or answers in a questionnaire (which has the tendency to be inaccurate, hence lowering the quality of the relationships formed through the application). Some examples of useful data which could help in matchmaking would include Netflix streaming data, Amazon e-commerce data, etc. As the number of users of the application increases, we recommend the use of Apache Kafka in order to be able to process huge streams of data from multiple users. Lastly, the platform could also utilize AWS Sagemaker in order to create deployable machine learning models in making model-based recommender systems.

References¶

[1] https://www.datingsitesreviews.com/staticpages/index.php?page=online-dating-history

[2] https://www.theguardian.com/media-network/media-network-blog/2014/jan/17/tinder-dating-psychology-technosexual

[3] https://medium.com/system-design-concepts/dating-application-system-design-aae411412267

[4] https://www.geeksforgeeks.org/haversine-formula-to-find-distance-between-two-points-on-a-sphere/

Appendix¶

In [55]:
%load_ext sql

Scripts to create OLTP tables¶

In [ ]:
with open('postgres_creds.txt') as f:
    conn_string = f.read()
In [ ]:
%sql $conn_string
In [ ]:
pw = getpass.getpass()
In [ ]:
%sql CREATE USER love_keeper WITH PASSWORD $pw;
In [ ]:
%sql GRANT love_keeper TO postgres;
In [ ]:
%sql CREATE DATABASE love_database OWNER love_keeper
In [ ]:
pw = getpass.getpass()
In [ ]:
%sql postgresql://love_keeper:$pw@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
In [ ]:
%%sql

DROP TABLE IF EXISTS users CASCADE;
DROP TABLE IF EXISTS songs CASCADE;
DROP TABLE IF EXISTS artists CASCADE;
DROP TABLE IF EXISTS listens CASCADE;
DROP TABLE IF EXISTS locations CASCADE;
DROP TABLE IF EXISTS songs_artists CASCADE;

DROP TABLE IF EXISTS users;
CREATE TABLE IF NOT EXISTS users (
    userID INTEGER PRIMARY KEY,
    userFullName VARCHAR,
    gender VARCHAR,
    level VARCHAR
);

DROP TABLE IF EXISTS songs CASCADE;
CREATE TABLE IF NOT EXISTS songs (
    song_id VARCHAR ,
    song_title VARCHAR PRIMARY KEY,
    release_date TIMESTAMP,
    year INTEGER,
    duration_ms INTEGER,
    acousticness FLOAT,
    danceability FLOAT,
    energy FLOAT,
    explicit INTEGER,
    instrumentalness FLOAT,
    key INTEGER,
    liveness FLOAT,
    loudness FLOAT,
    mode INTEGER,
    popularity INTEGER,
    speechiness FLOAT,
    tempo FLOAT,
    valence FLOAT 
);

DROP TABLE IF EXISTS listens;
CREATE TABLE IF NOT EXISTS listens (
    timestamp VARCHAR,
    userID INTEGER REFERENCES users,
    song_title VARCHAR REFERENCES songs
);

DROP TABLE IF EXISTS locations CASCADE;
CREATE TABLE IF NOT EXISTS locations (
    userID INTEGER REFERENCES users,
    longitude VARCHAR,
    latitude VARCHAR
);

DROP TABLE IF EXISTS artists CASCADE;
CREATE TABLE IF NOT EXISTS artists (
    artist_id VARCHAR,
    artists VARCHAR PRIMARY KEY
);

DROP TABLE IF EXISTS songs_artists;
CREATE TABLE IF NOT EXISTS songs_artists (
    artists VARCHAR REFERENCES artists,
    song_title VARCHAR REFERENCES songs
);

SET NOCOUNT ON;

Scripts to create OLAP Tables¶

In [ ]:
pw = getpass.getpass()
In [ ]:
%sql postgresql://awsuser:$pw@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/dev
In [ ]:
pw = getpass.getpass()
In [ ]:
%sql CREATE USER songmanager WITH PASSWORD $pw
In [ ]:
%sql GRANT songmanager TO awsuser;
In [ ]:
%sql CREATE DATABASE songsdb OWNER songmanager
In [ ]:
pw = getpass.getpass()
In [ ]:
%sql postgresql://songmanager:$pw@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
In [ ]:
%%sql

DROP TABLE IF EXISTS dim_users;

CREATE TABLE IF NOT EXISTS dim_users (
    id INTEGER NOT NULL,
    user_id INTEGER NOT NULL,
    full_name TEXT NOT NULL,
    gender TEXT NOT NULL,
    level TEXT NOT NULL
);

DROP TABLE IF EXISTS dim_locations;

CREATE TABLE IF NOT EXISTS dim_locations (
    user_id INTEGER NOT NULL,
    longitude FLOAT,
    latitude FLOAT
);

DROP TABLE IF EXISTS dim_artists;

CREATE TABLE IF NOT EXISTS dim_artists (
    artist_id INTEGER NOT NULL,
    artist TEXT NOT NULL
);

DROP TABLE IF EXISTS dim_songs;

CREATE TABLE IF NOT EXISTS dim_songs (
    song_id TEXT NOT NULL,
    title TEXT NOT NULL,
    year INTEGER NOT NULL,
    month INTEGER NOT NULL,
    day INTEGER NOT NULL,
    duration_ms INTEGER NOT NULL,
    acousticness FLOAT NOT NULL,
    danceability FLOAT NOT NULL,
    energy FLOAT NOT NULL,
    explicit INTEGER,
    instrumentalness FLOAT NOT NULL,
    key INTEGER NOT NULL,
    liveness FLOAT NOT NULL,
    loudness FLOAT NOT NULL,
    mode INTEGER NOT NULL,
    popularity INTEGER NOT NULL,
    speechiness FLOAT NOT NULL,
    tempo FLOAT NOT NULL,
    valence FLOAT NOT NULL
);

DROP TABLE IF EXISTS dim_dates;

CREATE TABLE IF NOT EXISTS dim_dates (
    date_id INTEGER NOT NULL,
    year INTEGER NOT NULL,
    month INTEGER NOT NULL,
    day INTEGER NOT NULL,
    hour INTEGER NOT NULL,
    minute INTEGER NOT NULL,
    second INTEGER NOT NULL
);

DROP TABLE IF EXISTS fact_listens;
CREATE TABLE IF NOT EXISTS fact_listens (
    user_id INTEGER NOT NULL,
    song_id VARCHAR NOT NULL,
    title TEXT NOT NULL,
    date_id INTEGER NOT NULL
);

SET NOCOUNT ON;

OLTP Schema¶

In [ ]:
pw = getpass.getpass()
In [105]:
%sql postgresql://love_keeper:$pw@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
Out[105]:
'Connected: love_keeper@love_database'
In [60]:
%sql \d
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
6 rows affected.
Out[60]:
Schema Name Type Owner
public artists table love_keeper
public listens table love_keeper
public locations table love_keeper
public songs table love_keeper
public songs_artists table love_keeper
public users table love_keeper
In [61]:
%sql \dt
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
6 rows affected.
Out[61]:
Schema Name Type Owner
public artists table love_keeper
public listens table love_keeper
public locations table love_keeper
public songs table love_keeper
public songs_artists table love_keeper
public users table love_keeper
In [65]:
%sql SELECT COUNT (*) FROM artists;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
1 rows affected.
Out[65]:
count
1157
In [66]:
%sql SELECT COUNT (*) FROM listens;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
1 rows affected.
Out[66]:
count
4224
In [67]:
%sql SELECT COUNT (*) FROM locations;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
1 rows affected.
Out[67]:
count
396
In [68]:
%sql SELECT COUNT (*) FROM songs;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
1 rows affected.
Out[68]:
count
1926
In [69]:
%sql SELECT COUNT (*) FROM songs_artists;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
1 rows affected.
Out[69]:
count
17058
In [70]:
%sql SELECT COUNT (*) FROM users;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
1 rows affected.
Out[70]:
count
66
In [106]:
%sql SELECT * FROM artists LIMIT 100;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
   postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[106]:
artist_id artists
0 David Guetta
1 Bebe Rexha
2 J Balvin
3 6ix9ine
4 Nicki Minaj
5 Kanye West
6 Morgan Wallen
7 Ariana Grande
8 Steve Aoki
9 BTS
10 Forrest.
11 Biskwiq
12 Plies
13 Kevin Gates
14 Shinedown
15 Lil Baby
16 Gunna
17 Lil Durk
18 NAV
19 Seven Lions
20 SLANDER
21 Dabin
22 Dylan Matthew
23 Nipsey Hussle
24 Megan Thee Stallion
25 6LACK
26 Yeek
27 Justin Timberlake
28 Chris Stapleton
29 Night Lovell
30 $uicideBoy$
31 YoungBoy Never Broke Again
32 DaniLeigh
33 Jakob Ogawa
34 KAROL G
35 Anuel AA
36 Banda Los Sebastianes
37 Shoreline Mafia
38 Morgan Evans
39 Kelsea Ballerini
40 Wallows
41 T3R Elemento
42 Paris Shadows
43 Trippie Redd
44 N.E.R.D
45 Rihanna
46 Drake
47 A$AP Rocky
48 Moby
49 TWICE
50 Tinashe
51 XXXTENTACION
52 Goody Grace
53 gnash
54 Mitchell Tenpenny
55 Playboi Carti
56 Young Thug
57 Tyga
58 Lil Skies
59 Ryce
60 Luke Combs
61 JID
62 Clean Bandit
63 Zara Larsson
64 Abhi The Nomad
65 Harrison Sands
66 Copper King
67 Choker
68 Metro Boomin
69 Travis Scott
70 Kodak Black
71 21 Savage
72 SHAUN
73 Conor Maynard
74 Sam Feldt
75 Sure Sure
76 Billie Eilish
77 TroyBoi
78 Phil Wickham
79 JPEGMAFIA
80 Aminé
81 Future
82 Juice WRLD
83 AViVA
84 物語シリーズ
85 Marc Anthony
86 Will Smith
87 Bad Bunny
88 5 Seconds of Summer
89 Alex Rose
90 Casper Magico
91 Nengo Flow
92 Bryant Myers
93 Noriel
94 Juhn Allstar
95 Miky Woodz
96 Jhay Cortez
97 Myke Towers
98 Louis The Child
99 NoMBe
In [107]:
%sql SELECT * FROM listens LIMIT 100;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
   postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[107]:
timestamp userid song_title
1541105830 26 Stir Fry
1541106106 23 Sin (feat. Jaden Smith)
1541106132 23 Rock Salt and Nails (Live)
1541105830 26 Stir Fry
1541106496 23 La Forma en Que Me Miras
1541106106 23 Sin (feat. Jaden Smith)
1541107053 47 History
1541106132 23 Rock Salt and Nails (Live)
1541107734 23 Mockingbird
1541108520 26 Missin You Crazy
1541109015 47 Born To Be Yours
1541109125 47 REEL IT IN
1541109325 47 Promises (with Sam Smith)
1541110994 47 Victory Lap (feat. Stacy Barthe)
1541160187 35 WANTED U
1541105830 26 Stir Fry
1541160570 45 Promises (with Sam Smith)
1541160592 42 Hey Look Ma, I Made It
1541160692 60 Victory Lap (feat. Stacy Barthe)
1541160714 57 This Is It
1541106496 23 La Forma en Que Me Miras
1541160936 57 Level Up
1541161024 57 Born To Be Yours
1541161144 60 Level Up
1541161175 57 GTG
1541161381 57 Moon River
1541161550 57 REMEDY
1541106106 23 Sin (feat. Jaden Smith)
1541162141 2 Missin You Crazy
1541162370 2 This Is It
1541162691 2 Elastic
1541162912 2 1999
1541163225 60 Teenager in Love
1541107053 47 History
1541163717 57 History
1541163997 35 Born To Be Yours
1541106132 23 Rock Salt and Nails (Live)
1541165583 42 Missin You Crazy
1541107734 23 Mockingbird
1541108520 26 Missin You Crazy
1541166273 1 Born To Be Yours
1541166621 60 Diamond Teeth Samurai
1541166847 37 Butter Pecan - Bonus Track
1541109015 47 Born To Be Yours
1541168424 37 Level Up
1541168577 45 all the kids are depressed
1541109125 47 REEL IT IN
1541109325 47 Promises (with Sam Smith)
1541170090 45 RIP (feat. G-Eazy & Drew Love)
1541110994 47 Victory Lap (feat. Stacy Barthe)
1541160187 35 WANTED U
1541172424 57 Nice Guy (with Jessie Reyez)
1541172593 32 Beyond
1541172804 1 Teenager in Love
1541173059 61 The Reason
1541173221 28 Cut My Lip
1541173396 32 History
1541105830 26 Stir Fry
1541174308 27 Rock Salt and Nails (Live)
1541174309 27 Beautiful
1541174356 28 Ugotme
1541174633 27 Promises (with Sam Smith)
1541174803 27 Haunt U
1541174900 28 The Flute Song
1541175054 61 Hunnybee
1541175153 14 Silly Girl
1541175433 29 Forever Ever
1541175530 29 Had Me By Halftime
1541175733 29 Missin You Crazy
1541175894 29 Ugotme
1541176076 14 History
1541176254 14 Forgive You
1541176327 14 Airplane Mode
1541176371 29 Hey Look Ma, I Made It
1541176739 32 Rock Salt and Nails (Live)
1541160570 45 Promises (with Sam Smith)
1541160592 42 Hey Look Ma, I Made It
1541177497 25 The Other
1541177709 25 Singularity
1541178038 25 Talk Up (feat. Jay-Z)
1541178048 25 Forgive You
1541179117 61 Desperate Man
1541160692 60 Victory Lap (feat. Stacy Barthe)
1541207123 63 This Is It
1541207150 34 Missin You Crazy
1541160714 57 This Is It
1541106496 23 La Forma en Que Me Miras
1541160936 57 Level Up
1541254670 63 Someday
1541161024 57 Born To Be Yours
1541259368 59 Hold My Girl
1541260356 54 Victory Lap (feat. Stacy Barthe)
1541260488 54 The Flute Song
1541161144 60 Level Up
1541161175 57 GTG
1541261259 42 Shoulda Named It After Me
1541261432 42 Someday
1541261690 31 REMEDY
1541161381 57 Moon River
1541262293 31 Mockingbird
In [108]:
%sql SELECT * FROM locations LIMIT 100;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
   postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[108]:
userid longitude latitude
1 121.0577507 14.24777986
2 121.0244985 14.39202707
3 121.0513364 14.66512714
4 121.0323808 14.6287284
5 121.0546188 14.65372321
6 123.8736614 10.32727586
7 121.0598459 14.24851989
8 121.0553765 14.2407114
9 123.8694782 10.30912542
10 121.0327252 14.41835138
11 121.0377921 14.2298179
12 121.0364086 14.65605779
13 121.0450169 14.23488394
14 123.8524017 10.33690971
15 121.0709862 14.24565958
16 123.8014537 10.34546434
17 121.0614733 14.25084438
18 121.0350828 14.64738321
19 123.8582127 10.31223214
20 121.0308051 14.65929261
21 121.0125751 14.42874196
22 121.0265459 14.39094848
23 121.0272516 14.66177014
24 121.0639547 14.2394838
25 123.7952983 10.31724588
26 121.0330756 14.64205712
27 121.0732049 14.25466135
28 123.8251368 10.34494141
29 123.8054298 10.31592916
30 120.9887098 14.44612893
31 121.0488945 14.23072357
32 121.0717828 14.25677277
33 123.8028315 10.32714818
34 121.0566622 14.6427324
35 121.063723 14.24150137
36 123.8747406 10.3220519
37 121.0253021 14.38489132
38 123.8645445 10.31884932
39 121.0586424 14.23406803
40 121.0385185 14.39207613
41 121.0554547 14.24165886
42 121.029903 14.4049765
43 123.8328213 10.31070784
44 121.0056655 14.44359233
45 121.0756029 14.25468692
46 121.0593643 14.65222826
47 121.0398971 14.6628413
48 123.8737461 10.33265349
49 121.0474131 14.66397988
50 121.0345234 14.44554349
51 121.0414076 14.64044368
52 123.8708662 10.3218514
53 121.0464644 14.65888188
54 121.0288163 14.43105088
55 121.0158293 14.6647035
56 121.059258 14.6319537
57 121.0503601 14.64953264
58 121.0298833 14.40808912
59 121.0627138 14.23774099
60 121.017674 14.43148611
61 123.83282 10.30009076
62 121.0201485 14.3833865
63 121.0398209 14.43897897
64 121.0035655 14.44066566
65 121.0397764 14.44222001
66 121.0318088 14.62332026
1 121.0577507 14.24777986
2 121.0244985 14.39202707
3 121.0513364 14.66512714
4 121.0323808 14.6287284
5 121.0546188 14.65372321
6 123.8736614 10.32727586
7 121.0598459 14.24851989
8 121.0553765 14.2407114
9 123.8694782 10.30912542
10 121.0327252 14.41835138
11 121.0377921 14.2298179
12 121.0364086 14.65605779
13 121.0450169 14.23488394
14 123.8524017 10.33690971
15 121.0709862 14.24565958
16 123.8014537 10.34546434
17 121.0614733 14.25084438
18 121.0350828 14.64738321
19 123.8582127 10.31223214
20 121.0308051 14.65929261
21 121.0125751 14.42874196
22 121.0265459 14.39094848
23 121.0272516 14.66177014
24 121.0639547 14.2394838
25 123.7952983 10.31724588
26 121.0330756 14.64205712
27 121.0732049 14.25466135
28 123.8251368 10.34494141
29 123.8054298 10.31592916
30 120.9887098 14.44612893
31 121.0488945 14.23072357
32 121.0717828 14.25677277
33 123.8028315 10.32714818
34 121.0566622 14.6427324
In [109]:
%sql SELECT * FROM songs LIMIT 100;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
   postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[109]:
song_id song_title release_date year duration_ms acousticness danceability energy explicit instrumentalness key liveness loudness mode popularity speechiness tempo valence
3MoV1UsAJmz64LHqyiRMp0 Say My Name 2018-09-14 00:00:00 2018 198947 0.137 0.703 0.66 0 0.0 10 0.192 -5.119 0 66 0.123 95.031 0.79
5FlmoOiXokZHJvESrFbsyg MAMA 2018-11-27 00:00:00 2018 192405 0.0893 0.761 0.672 1 9.77e-06 1 0.14 -5.887 1 68 0.212 160.077 0.437
5x7Fb9wMOtxd7eZzQ9EzVd Gone Girl 2018-04-27 00:00:00 2018 163453 0.287 0.527 0.868 0 0.0 2 0.0885 -2.653 1 61 0.0416 172.086 0.77
5YeHLHDdQ4nKHk81XFWhCU successful 2018-08-17 00:00:00 2018 227387 0.107 0.847 0.603 1 7.4e-06 0 0.106 -4.607 0 67 0.0397 114.045 0.735
66XDf77gBcfQKi4I2vIbTj Waste It On Me 2018-11-09 00:00:00 2018 192537 0.252 0.669 0.684 0 0.0 7 0.72 -4.595 1 70 0.0585 96.097 0.466
1gewwJHPkvysxLEGgOKQuB Why Not Me 2018-07-01 00:00:00 2018 195000 0.48 0.905 0.721 0 0.0 7 0.528 -6.922 1 64 0.31 145.958 0.695
2CY9aDyI5NQLSuvU2d2VzX All Thee Above (feat. Kevin Gates) 2018-10-12 00:00:00 2018 212400 0.00988 0.736 0.568 1 0.0 1 0.0731 -5.973 1 61 0.135 145.93 0.514
1OKQIvaO7yn7R2BpdUFG7D GET UP 2018-05-04 00:00:00 2018 245787 0.0174 0.552 0.935 0 1.98e-06 0 0.116 -4.096 1 63 0.0623 74.954 0.38
6PoKfDY78ih5OveWirggRX Off White VLONE (Lil Baby & Gunna feat. Lil Durk & NAV) 2018-10-05 00:00:00 2018 187465 0.105 0.859 0.506 1 0.0 10 0.113 -8.115 0 63 0.418 140.973 0.153
27YD36FUikKxbp3bKSiKGZ First Time (feat. Dylan Matthew) 2018-10-12 00:00:00 2018 297782 0.251 0.422 0.677 0 3.55e-06 11 0.16 -5.76 1 63 0.0365 150.131 0.146
7yB6GF5kgPEoeGE4EAWiJU Blue Laces 2 2018-02-16 00:00:00 2018 250920 0.0276 0.569 0.836 1 0.0 4 0.479 -3.831 0 62 0.226 75.701 0.0927
2Xc9MR2NMc5IKPFKaEjt1k Hot Girl 2018-12-21 00:00:00 2018 196282 0.000278 0.761 0.637 1 0.0 10 0.251 -5.432 0 65 0.313 185.904 0.374
29ZZAShKhwRUuKULSNtxX6 Switch 2018-09-14 00:00:00 2018 207071 0.0409 0.66 0.658 1 0.00595 7 0.102 -6.127 1 67 0.0387 176.701 0.153
3uwrczwM9A3xX63jlcRhIC Cleaner Air 2018-10-30 00:00:00 2018 128129 0.876 0.717 0.797 1 0.000694 7 0.0836 -5.393 1 64 0.0368 77.492 0.849
09ts3GnICqYEU5PkQCpJK3 Say Something (feat. Chris Stapleton) 2018-01-25 00:00:00 2018 278893 0.103 0.707 0.632 0 1.09e-05 10 0.0841 -7.031 1 60 0.0789 97.04 0.372
5mrOnabhXnvCf5vFF7NJhG Joan of Arc 2018-06-08 00:00:00 2018 145161 0.00375 0.915 0.668 1 0.00241 1 0.253 -2.52 1 64 0.04 124.053 0.0387
2ulXW6DPrhTDt7ZqqYDdhu Genie 2018-06-08 00:00:00 2018 194792 0.286 0.598 0.605 1 0.0 7 0.0875 -4.92 0 62 0.452 159.517 0.513
2fwkl5Z8ujDtAh6qxEA5x0 Lil Bebe (feat. Lil Baby) - Remix 2018-11-30 00:00:00 2018 174720 0.0481 0.856 0.485 1 0.0 5 0.108 -5.519 0 65 0.0597 136.005 0.0929
41o2ydrj7Xm9Yt5odIBqq4 Velvet Light 2018-04-05 00:00:00 2018 141587 0.75 0.67 0.419 0 0.135 6 0.0976 -10.523 0 63 0.0751 77.919 0.513
6q8Lb50EtqNeeJXXe8mMAH Culpables 2018-09-14 00:00:00 2018 226627 0.279 0.728 0.801 0 4.71e-06 8 0.143 -2.942 0 69 0.0683 130.059 0.416
1uqjLVaSyTkdDvfGAmXrHo A Través Del Vaso 2018-08-31 00:00:00 2018 180120 0.621 0.649 0.529 0 0.0 2 0.0563 -4.895 0 65 0.0398 144.869 0.964
2YVt7k1qZXFhzYDe8FMDW6 No Mentions 2018-09-21 00:00:00 2018 220260 0.27 0.803 0.54 1 0.0 1 0.0819 -6.862 0 61 0.318 139.989 0.345
4k0ZX0KlMBEOlTp5vexIrT Bottle Service 2018-05-04 00:00:00 2018 156837 0.113 0.849 0.582 1 0.0 8 0.152 -6.407 1 62 0.214 100.049 0.516
07sa7qkxpemze5BBE7jufs Dance with Me (feat. Kelsea Ballerini) 2018-10-12 00:00:00 2018 199501 0.481 0.611 0.519 0 4.82e-06 1 0.107 -9.854 1 63 0.035 147.981 0.462
3VzULnaqPKf1deQyZ5W4Ah 1980s Horror Film II 2018-10-11 00:00:00 2018 235272 0.0038 0.593 0.658 0 0.0126 2 0.0804 -4.957 1 65 0.0304 116.021 0.27
0laoKkCyhp5HW3wGejhPV5 En Menos de Un Minuto 2018-11-16 00:00:00 2018 177714 0.554 0.702 0.621 1 0.0 7 0.226 -6.884 1 65 0.0296 114.532 0.581
2J84DvAAI8ph5DYojk3HPt GONE (feat. Trippie Redd) 2018-04-25 00:00:00 2018 190893 0.00107 0.712 0.731 1 0.0 9 0.219 -4.555 1 64 0.0335 118.016 0.876
739vCwA3EpBSkk3uDsI2wB Lemon (feat. Drake) - Drake Remix 2018-03-18 00:00:00 2018 226107 0.000738 0.777 0.657 1 3.34e-06 1 0.0932 -6.434 1 64 0.237 189.992 0.245
1YmF9PvLhIISIANoMLIYGq A$AP Forever (feat. Moby) 2018-04-05 00:00:00 2018 233061 0.224 0.469 0.78 1 0.0 10 0.13 -6.49 0 67 0.122 125.999 0.444
1IX47gefluXmKX4PrTBCRM What is Love 2018-07-09 00:00:00 2018 208240 0.0988 0.574 0.909 0 0.0 8 0.264 -1.572 1 72 0.0655 169.964 0.565
0Sbeihd75SGitymHS8Bqbh I Hate Love Songs 2018-10-26 00:00:00 2018 191187 0.252 0.386 0.446 0 0.0 11 0.18 -7.807 1 62 0.0297 179.995 0.4
7D8aQaRzoi9Qzz5yerVK5b Throw A Fit 2018-07-27 00:00:00 2018 158757 0.00597 0.888 0.586 1 0.0365 0 0.153 -6.745 1 66 0.0604 150.051 0.401
4axYWgXASdsCBHuwInAubT I don’t let go 2018-12-07 00:00:00 2018 121630 0.591 0.797 0.293 1 0.0724 5 0.148 -7.479 1 65 0.08 80.035 0.478
1746Hs7bQJMMYAZvVYM3ig Two Shots (feat. gnash) 2018-03-23 00:00:00 2018 179915 0.0542 0.768 0.408 1 0.0 4 0.106 -8.551 0 61 0.0714 129.96 0.0905
5SO7pvSFiXwU0e1z7C4X7H Alcohol You Later 2018-12-14 00:00:00 2018 186600 0.0259 0.671 0.597 0 0.0 8 0.116 -7.011 1 62 0.0351 100.039 0.422
5O9zs6G6RcB6yP1OKwnwiM Choppa Won't Miss (feat. Young Thug) 2018-05-11 00:00:00 2018 217960 0.0182 0.858 0.661 1 0.0 0 0.115 -6.544 1 63 0.319 163.056 0.611
55GiOwtxNajXVbpjjIzMnD SWISH 2018-07-25 00:00:00 2018 194936 0.0831 0.884 0.541 1 0.0 11 0.118 -5.608 1 70 0.221 105.009 0.274
0chpIFSjgfd1CPrAxM7PWp No Rest 2018-08-13 00:00:00 2018 135535 0.0186 0.655 0.498 1 4.47e-06 9 0.139 -13.147 1 64 0.0479 162.861 0.508
5awvelCGpDQHwgZem0ira9 Lately 2018-11-01 00:00:00 2018 207000 0.213 0.718 0.632 0 1.28e-05 0 0.283 -4.011 1 64 0.145 160.039 0.859
0h31TZMlv0ZLc5yppKngwk Memories Are Made Of 2018-06-01 00:00:00 2018 216667 0.0159 0.694 0.718 0 1.61e-06 2 0.287 -6.188 1 62 0.0322 121.078 0.364
46Lx5epWOQBA3J86ovnxmV Workin Out 2018-11-26 00:00:00 2018 226859 0.384 0.818 0.57 1 0.000102 10 0.19 -8.067 0 64 0.285 126.938 0.561
72gv4zhNvRVdQA0eOenCal Symphony (feat. Zara Larsson) 2018-11-30 00:00:00 2018 212733 0.239 0.715 0.605 0 1.41e-05 0 0.189 -5.128 0 74 0.0428 122.956 0.454
0G2wimhVoDYXbQ6csDxtSf Sex n' Drugs 2018-02-09 00:00:00 2018 217778 0.299 0.831 0.32 0 0.0 7 0.0757 -7.916 1 65 0.143 80.935 0.319
6OpU3cvY19M2hFxTbLKy5L Juno 2018-08-03 00:00:00 2018 231681 0.921 0.442 0.267 1 0.00026 7 0.148 -12.857 1 63 0.0365 120.075 0.116
42k9e2hgl98IdgkXLg4pxs No More (feat. Travis Scott, Kodak Black & 21 Savage) 2018-11-06 00:00:00 2018 265120 0.413 0.636 0.501 1 0.0 11 0.339 -7.26 0 65 0.329 82.309 0.196
1ZLrDPgR7mvuTco3rQK8Pk Way Back Home (feat. Conor Maynard) - Sam Feldt Edit 2018-12-21 00:00:00 2018 192453 0.269 0.737 0.747 0 0.0 10 0.219 -4.818 1 75 0.0323 105.943 0.447
6PuoZT4kgw5DrUEdnQ6e01 This Must Be The Place 2018-01-12 00:00:00 2018 219261 0.349 0.886 0.371 0 0.00138 5 0.0985 -10.542 1 63 0.0465 115.026 0.928
1T3E8z6VMXSsACrBSQHTkN MyBoi - TroyBoi Remix 2018-03-09 00:00:00 2018 211304 0.00626 0.879 0.58 0 0.0433 2 0.0653 -4.797 1 68 0.0883 92.004 0.312
6nVm313QmsPlNllntTart1 Living Hope 2018-08-03 00:00:00 2018 327000 0.0304 0.211 0.558 0 0.0 3 0.0978 -6.232 1 61 0.041 143.452 0.317
6XyxCBp6x3jvtxXvMN5sAA 1539 N. Calvert 2018-01-19 00:00:00 2018 157259 0.278 0.863 0.717 1 0.0826 4 0.0991 -3.143 0 64 0.17 152.05 0.357
6VS7wKwtvL2FvTupYSWZ9e Invincible 2018-12-14 00:00:00 2018 196387 0.141 0.724 0.6 0 0.000218 10 0.104 -4.391 0 65 0.29 86.984 0.466
1hHe7cirScEJWgAS0JFqTL 7 Am Freestyle 2018-10-19 00:00:00 2018 192907 0.0185 0.951 0.401 1 0.0 0 0.149 -8.92 0 63 0.411 129.976 0.539
1jfBIylYmLMklQOpbWT5s0 Blame It On The Kids 2018-11-16 00:00:00 2018 196341 0.364 0.719 0.44 0 0.0 6 0.163 -6.266 0 67 0.166 133.013 0.505
5P8lyudWE7HQxb4ludLbEm Renai Circulation 2018-11-02 00:00:00 2018 255040 0.0524 0.814 0.785 0 0.00111 1 0.111 -5.679 0 68 0.0396 120.009 0.941
3z1QH1O1h15lFROjcECqgi A Long Way 2018-06-01 00:00:00 2018 217280 0.0194 0.483 0.713 0 0.000151 2 0.316 -5.027 1 62 0.0295 161.905 0.594
2wRkBumdItthjYP9XknImg Est√° Rico 2018-09-28 00:00:00 2018 215151 0.266 0.817 0.74 0 0.0 11 0.0992 -4.566 0 69 0.283 130.04 0.674
2vHfabj6nFebekTYODqntl Want You Back 2018-06-15 00:00:00 2018 173082 0.00699 0.618 0.867 1 1.03e-06 0 0.202 -4.915 1 68 0.0439 98.03 0.579
45WkcBtgPS8xulUwEg63X1 TOGETHER 2018-08-15 00:00:00 2018 214707 0.469 0.622 0.375 1 1.28e-05 4 0.185 -8.115 0 63 0.208 132.887 0.357
6xqbec9MHRmW7qtEINAlRx Darte Remix 2018-07-27 00:00:00 2018 512093 0.144 0.81 0.631 1 1.32e-06 4 0.451 -6.541 0 70 0.0938 154.976 0.573
7f4bRMqRKRfUNPW2gbTHSt Save Me From Myself (with NoMBe & Big Gigantic) 2018-10-26 00:00:00 2018 211686 0.0198 0.673 0.676 0 1.26e-05 4 0.428 -4.888 0 62 0.216 82.979 0.391
1LV6zxUFIUyePwsdFHNIYA El de la Gorra 27 2018-04-20 00:00:00 2018 200071 0.781 0.663 0.785 0 2.33e-06 11 0.113 -3.874 0 67 0.0365 107.455 0.84
3FskQrDXcY24ur2fCvz35O Ye 2018-01-26 00:00:00 2018 231618 0.0181 0.503 0.592 1 0.00014 5 0.169 -5.923 0 72 0.444 201.85 0.344
3TgelNfEK1DnYrsHOBxaVJ El De Los Lentes Gucci 2018-07-10 00:00:00 2018 125412 0.37 0.765 0.539 0 0.0 11 0.039 -5.028 0 68 0.0437 99.94 0.965
1D23GEeQ32aqLw5ZnY3Ivp COOLEST MONKEY IN THE JUNGLE 2018-05-11 00:00:00 2018 152213 0.324 0.893 0.676 1 0.0 9 0.0968 -4.867 0 63 0.227 125.923 0.804
4BiiOzZCrXEzHRLYcYFiD5 Hope 2018-12-14 00:00:00 2018 180120 0.488 0.773 0.699 0 3.6e-06 1 0.0814 -5.982 0 71 0.0958 104.941 0.513
6e13443Ve7RGcAUScTgYtl The Last Of The Real Ones 2018-01-19 00:00:00 2018 230400 0.00599 0.576 0.872 0 0.0 8 0.188 -5.293 1 63 0.0471 115.042 0.559
2yI8omCrTAyrTcK7NRppu7 Ocean (feat. Jacquees) 2018-06-22 00:00:00 2018 219437 0.268 0.709 0.264 1 0.0 1 0.24 -17.696 0 62 0.186 133.858 0.341
2lRe5wBRm4xaSKTbDn2vLD Trippy (feat. J. Cole) 2018-11-16 00:00:00 2018 323587 0.0707 0.57 0.65 1 0.0 2 0.339 -8.538 1 65 0.326 96.767 0.174
1j8gLQNfFOI0vs8qcVTWwa ¿Por Qué Cambiaste De Opinión? 2018-07-20 00:00:00 2018 241907 0.446 0.735 0.49 0 3.52e-06 5 0.129 -4.937 1 69 0.0299 135.159 0.89
4n1bdaKwynQndm47x5HqWX Blue Tint 2018-06-29 00:00:00 2018 162800 0.013 0.888 0.372 1 0.0 7 0.184 -10.254 0 64 0.358 133.998 0.123
2u6XJ2PPaCyyRJqC0x6TMH Through the Storm 2018-06-08 00:00:00 2018 163567 0.631 0.881 0.581 1 0.0 5 0.0986 -5.162 1 61 0.0901 78.508 0.876
6euzJRzwBig3NESMppkxOK All I Got 2018-02-16 00:00:00 2018 218704 0.12 0.667 0.509 0 1.91e-06 1 0.269 -6.584 1 64 0.0348 104.995 0.289
4uTFofZ26M2VgCVq3ixUUY Haunt U 2018-02-24 00:00:00 2018 100260 0.393 0.463 0.632 1 0.0 7 0.127 -9.577 1 68 0.0525 163.71 0.0894
4ToogKQq58R9TvO40qRuxi I Wanna Know 2018-07-27 00:00:00 2018 206727 0.501 0.432 0.85 0 6.59e-06 10 0.203 -4.417 1 63 0.0771 82.607 0.109
1KnJwmgLyBfB7u920qLbEq Romance Garbage 2018-07-07 00:00:00 2018 122070 0.152 0.467 0.553 0 0.683 6 0.291 -9.692 0 64 0.1 144.943 0.39
0FbSpY8RPxTRxi1Fpj9Dz9 Booty 2018-02-27 00:00:00 2018 160038 0.000831 0.938 0.549 1 0.000243 1 0.082 -8.368 1 62 0.098 144.005 0.412
08zAFXhfyIxq9359NHksdP Do Not Disturb (feat. Lil Yachty & Offset) 2018-04-13 00:00:00 2018 157557 0.203 0.878 0.647 1 0.00185 10 0.335 -4.566 0 64 0.0659 131.027 0.212
72B1Omo2kmEst4YKO8vOB6 Scars 2018-03-02 00:00:00 2018 238307 0.048 0.417 0.524 0 0.0 10 0.103 -6.6 1 62 0.0326 75.266 0.152
2SPEg45ZaUFJCg9GwNmRvw None Of My Business 2018-10-19 00:00:00 2018 184005 0.428 0.793 0.488 0 0.0 2 0.0823 -5.807 0 68 0.0462 133.957 0.5
3ZsLuJ8uU0lE6rGTkDaqEg Alone Together 2018-06-22 00:00:00 2018 171440 0.00527 0.541 0.87 0 0.0 1 0.0867 -4.741 1 62 0.122 179.975 0.665
2TLDilmv1UmRTfyb7f2Hrx One That Got Away 2018-06-01 00:00:00 2018 218733 0.00303 0.565 0.825 0 0.0 10 0.0779 -4.245 1 61 0.0417 160.003 0.609
4qjLvvBh5ZeKEPyShKRf06 Narcissist (feat. The 1975) 2018-08-28 00:00:00 2018 195502 0.218 0.543 0.682 1 1.53e-06 3 0.085 -5.81 1 67 0.113 179.969 0.424
41a7dZcq30Ss5kPMayWRV0 That’s How You Feel 2018-06-29 00:00:00 2018 157605 0.117 0.85 0.473 1 0.0105 10 0.108 -5.747 0 65 0.209 85.454 0.38
48x8fBWfyeSMzLl5gbsjr9 COME THRU 2018-10-26 00:00:00 2018 153169 0.163 0.66 0.596 1 0.86 0 0.109 -8.207 1 65 0.0358 77.026 0.0757
1FDV80LC5JnTDAEjPDovER Worthy Of You 2018-12-06 00:00:00 2018 190619 0.439 0.451 0.452 0 0.0 1 0.478 -9.363 0 66 0.0574 57.523 0.386
33DzKnwuBE6lfOiADwzd5E TOOTIMETOOTIMETOOTIME 2018-11-30 00:00:00 2018 207851 0.177 0.667 0.824 0 0.0152 8 0.0933 -5.867 1 68 0.0665 114.998 0.518
1x3W8RZxW94lrVGhP95qA6 You Are The Reason - Duet Version 2018-11-30 00:00:00 2018 190760 0.382 0.403 0.356 0 0.0 10 0.148 -7.547 1 72 0.0298 171.454 0.151
3lWzVNe1yFZlkeBBzUuZYu Bella Ciao - M√∫sica Original de la Serie la Casa de Papel/ Money Heist 2018-04-20 00:00:00 2018 139845 0.538 0.46 0.564 0 0.0 9 0.191 -10.329 0 78 0.0343 135.178 0.962
18xmRkPWE0sGUzFyNNAAhN Dying in LA 2018-06-22 00:00:00 2018 229053 0.876 0.346 0.35 0 1.98e-06 2 0.0708 -4.988 1 65 0.0328 126.444 0.315
6DuEfVcF8bmEkiFKr41SiM Prayed For You - Acoustic 2018-11-16 00:00:00 2018 205915 0.63 0.713 0.386 0 0.0 4 0.151 -6.995 1 61 0.0291 139.968 0.52
5sWbwccBcyHsg5LEKWGZo9 BRACKETS 2018-04-20 00:00:00 2018 315771 0.19 0.675 0.567 1 1.77e-06 10 0.175 -9.147 0 63 0.263 84.039 0.658
3BLKY8w5XTehYgLb3cFZgg Deadman's Curve (Live) 2018-06-29 00:00:00 2018 207453 0.731 0.392 0.496 0 0.0 2 0.908 -9.664 1 60 0.203 172.243 0.604
6dJJT9h7FzjenIaa31bw8p Yo Fui El Elegante 2018-02-12 00:00:00 2018 146413 0.296 0.785 0.831 0 0.0 9 0.176 -4.439 1 66 0.193 134.829 0.581
7IOxauZfx5l18mYfX0GZm6 Knees 2018-06-22 00:00:00 2018 206581 0.0781 0.777 0.577 0 1.02e-06 11 0.0869 -6.872 0 68 0.0357 109.019 0.381
0r9d60ZYPAlFHq8qnUtkXN Madura (feat. Bad Bunny) 2018-04-13 00:00:00 2018 190909 0.138 0.878 0.555 0 1.34e-06 1 0.151 -7.639 0 70 0.202 88.029 0.751
645YBsxhFPSOKxGep3uHLv Love Maze 2018-05-18 00:00:00 2018 221529 0.0487 0.636 0.876 0 0.0 9 0.0728 -4.652 1 70 0.0617 149.932 0.759
6Z7m5DLNPOaPg9T6nXK0rQ DNA 2018-08-24 00:00:00 2018 223123 0.0223 0.599 0.774 0 0.0 1 0.0669 -4.014 0 72 0.0541 129.819 0.696
48XfojJC87VKLlUotYV0Gm The Truth Untold 2018-05-18 00:00:00 2018 242293 0.372 0.57 0.422 0 0.0 1 0.127 -7.561 0 70 0.0422 113.007 0.169
46za9pKXWkesH9DCrBbtNU Low 2018-04-28 00:00:00 2018 233963 0.689 0.746 0.438 1 0.103 4 0.126 -7.016 0 64 0.0326 119.991 0.042
50Hv5NZIM0puIUoBttjpfb King of the Clouds 2018-06-22 00:00:00 2018 160747 0.237 0.285 0.922 0 0.0 2 0.159 -1.932 0 64 0.247 198.063 0.472
In [110]:
%sql SELECT * FROM songs_artists LIMIT 100;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
   postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[110]:
artists song_title
David Guetta Say My Name
Bebe Rexha Say My Name
J Balvin Say My Name
6ix9ine MAMA
Nicki Minaj MAMA
Kanye West MAMA
Morgan Wallen Gone Girl
Ariana Grande successful
Steve Aoki Waste It On Me
BTS Waste It On Me
Forrest. Why Not Me
Biskwiq Why Not Me
Plies All Thee Above (feat. Kevin Gates)
Kevin Gates All Thee Above (feat. Kevin Gates)
Shinedown GET UP
Lil Baby Off White VLONE (Lil Baby & Gunna feat. Lil Durk & NAV)
Gunna Off White VLONE (Lil Baby & Gunna feat. Lil Durk & NAV)
Lil Durk Off White VLONE (Lil Baby & Gunna feat. Lil Durk & NAV)
NAV Off White VLONE (Lil Baby & Gunna feat. Lil Durk & NAV)
Seven Lions First Time (feat. Dylan Matthew)
SLANDER First Time (feat. Dylan Matthew)
Dabin First Time (feat. Dylan Matthew)
Dylan Matthew First Time (feat. Dylan Matthew)
Nipsey Hussle Blue Laces 2
Megan Thee Stallion Hot Girl
6LACK Switch
Yeek Cleaner Air
Justin Timberlake Say Something (feat. Chris Stapleton)
Chris Stapleton Say Something (feat. Chris Stapleton)
Night Lovell Joan of Arc
$uicideBoy$ Joan of Arc
YoungBoy Never Broke Again Genie
DaniLeigh Lil Bebe (feat. Lil Baby) - Remix
Lil Baby Lil Bebe (feat. Lil Baby) - Remix
Jakob Ogawa Velvet Light
KAROL G Culpables
Anuel AA Culpables
Banda Los Sebastianes A Través Del Vaso
YoungBoy Never Broke Again No Mentions
Shoreline Mafia Bottle Service
Morgan Evans Dance with Me (feat. Kelsea Ballerini)
Kelsea Ballerini Dance with Me (feat. Kelsea Ballerini)
Wallows 1980s Horror Film II
T3R Elemento En Menos de Un Minuto
Paris Shadows GONE (feat. Trippie Redd)
Trippie Redd GONE (feat. Trippie Redd)
N.E.R.D Lemon (feat. Drake) - Drake Remix
Rihanna Lemon (feat. Drake) - Drake Remix
Drake Lemon (feat. Drake) - Drake Remix
A$AP Rocky A$AP Forever (feat. Moby)
Moby A$AP Forever (feat. Moby)
TWICE What is Love
Kelsea Ballerini I Hate Love Songs
Tinashe Throw A Fit
XXXTENTACION I don’t let go
Goody Grace Two Shots (feat. gnash)
gnash Two Shots (feat. gnash)
Mitchell Tenpenny Alcohol You Later
Playboi Carti Choppa Won't Miss (feat. Young Thug)
Young Thug Choppa Won't Miss (feat. Young Thug)
Tyga SWISH
Lil Skies No Rest
Forrest. Lately
Biskwiq Lately
Ryce Lately
Luke Combs Memories Are Made Of
JID Workin Out
Clean Bandit Symphony (feat. Zara Larsson)
Zara Larsson Symphony (feat. Zara Larsson)
Abhi The Nomad Sex n' Drugs
Harrison Sands Sex n' Drugs
Copper King Sex n' Drugs
Choker Juno
Metro Boomin No More (feat. Travis Scott, Kodak Black & 21 Savage)
Travis Scott No More (feat. Travis Scott, Kodak Black & 21 Savage)
Kodak Black No More (feat. Travis Scott, Kodak Black & 21 Savage)
21 Savage No More (feat. Travis Scott, Kodak Black & 21 Savage)
SHAUN Way Back Home (feat. Conor Maynard) - Sam Feldt Edit
Conor Maynard Way Back Home (feat. Conor Maynard) - Sam Feldt Edit
Sam Feldt Way Back Home (feat. Conor Maynard) - Sam Feldt Edit
Sure Sure This Must Be The Place
Billie Eilish MyBoi - TroyBoi Remix
TroyBoi MyBoi - TroyBoi Remix
Phil Wickham Living Hope
JPEGMAFIA 1539 N. Calvert
Aminé Invincible
Future 7 Am Freestyle
Juice WRLD 7 Am Freestyle
AViVA Blame It On The Kids
物語シリーズ Renai Circulation
Luke Combs A Long Way
Marc Anthony Est√° Rico
Will Smith Est√° Rico
Bad Bunny Est√° Rico
5 Seconds of Summer Want You Back
Aminé TOGETHER
Alex Rose Darte Remix
Casper Magico Darte Remix
Nengo Flow Darte Remix
Bryant Myers Darte Remix
In [111]:
%sql SELECT * FROM users LIMIT 100;
 * postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
   postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
66 rows affected.
Out[111]:
userid userfullname gender level
1 Adelyn Jordan F free
2 Agnes Lazo F paid
3 Alyssa Fuentebella F free
4 Amos Changchoco M paid
5 Angelo Santiago M paid
6 Anna Ramos F paid
7 Argel Alvezo M paid
8 Bianca Cirio F paid
9 Billy Alegrado M paid
10 Bonnie Ruiz F paid
11 Chris Soriano M paid
12 Cid Ual M free
13 Dale Garcia M paid
14 Dan Nuesca M free
15 Donna Magugat F paid
16 Elizabeth Brooks F free
17 Emily Benson F paid
18 Fred Rara M paid
19 Gab Cadena M paid
20 Gelo Timajo M paid
21 Gil Uy F paid
22 Hayden Brock F paid
23 HJ Bayaborda M free
24 Jaffy Tan M paid
25 Jazel Jayme F free
26 Jill Aiken F free
27 Joaqs Gonzales M free
28 John Gacal M free
29 Julia Las F free
30 K-Ann Carandang F paid
31 Karen Reyes F paid
32 Ken Inocencio M free
33 Kimberly Hohl F paid
34 Lauren Frank F free
35 Law Guidote M paid
36 Lily Cooper F paid
37 Magdalene Herman F paid
38 Mandy Mapa M paid
39 Manu Gaspar M free
40 Marc Castro M free
41 Margs Baluyut F free
42 Mark Acot M paid
43 Marvee dela Resma M paid
44 Mason Hao M paid
45 Mavel Paderes F free
46 Miguel Pingol M paid
47 Mike Tanjango M free
48 Mimay Rodrigo F paid
49 Nancy Cruz F paid
50 Norman Lapid M paid
51 Pao de Guzman M paid
52 Paolo Punzalan M paid
53 Pat Donato F paid
54 Patrick La Rosa M paid
55 Patsy Hanlon F paid
56 Pauline Guevarra F free
57 Sharyl Sy F paid
58 Stacey Crosby F paid
59 Susan Boyer F paid
60 Toby Fernando M paid
61 Ven Velante M free
62 Vince Rivera M paid
63 Viola Conner F paid
64 Weddy Diamada M paid
65 Xidy del Rosario F paid
66 Zhoya Ginez F free
In [ ]:
 

OLAP Schema¶

In [ ]:
pw = getpass.getpass()
In [71]:
%sql postgresql://songmanager:$pw@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
Out[71]:
'Connected: songmanager@songsdb'
In [72]:
%sql \d
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
6 rows affected.
Out[72]:
schema name type owner
public dim_artists table songmanager
public dim_dates table songmanager
public dim_locations table songmanager
public dim_songs table songmanager
public dim_users table songmanager
public fact_listens table songmanager
In [73]:
%sql \dt
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
6 rows affected.
Out[73]:
schema name type owner
public dim_artists table songmanager
public dim_dates table songmanager
public dim_locations table songmanager
public dim_songs table songmanager
public dim_users table songmanager
public fact_listens table songmanager
In [91]:
%sql SELECT COUNT (*) FROM dim_artists;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
1 rows affected.
Out[91]:
count
10942
In [92]:
%sql SELECT COUNT (*) FROM dim_dates;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
1 rows affected.
Out[92]:
count
11469
In [93]:
%sql SELECT COUNT (*) FROM dim_locations;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
1 rows affected.
Out[93]:
count
1518
In [94]:
%sql SELECT COUNT (*) FROM dim_songs;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
1 rows affected.
Out[94]:
count
2000
In [95]:
%sql SELECT COUNT (*) FROM dim_users;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
1 rows affected.
Out[95]:
count
528
In [96]:
%sql SELECT COUNT (*) FROM fact_listens;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
1 rows affected.
Out[96]:
count
2841
In [97]:
%sql SELECT * FROM dim_artists LIMIT 100;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[97]:
artist_id artist
1 David Guetta
2 Bebe Rexha
3 J Balvin
4 6ix9ine
5 Nicki Minaj
6 Kanye West
7 Morgan Wallen
8 Ariana Grande
9 Steve Aoki
10 BTS
11 Forrest.
12 Biskwiq
13 Plies
14 Kevin Gates
15 Shinedown
16 Lil Baby
17 Gunna
18 Lil Durk
19 NAV
20 Seven Lions
21 SLANDER
22 Dabin
23 Dylan Matthew
24 Nipsey Hussle
25 Megan Thee Stallion
26 6LACK
27 Yeek
28 Justin Timberlake
29 Chris Stapleton
30 Night Lovell
31 $uicideBoy$
32 YoungBoy Never Broke Again
33 DaniLeigh
34 Lil Baby
35 Jakob Ogawa
36 KAROL G
37 Anuel AA
38 Banda Los Sebastianes
39 YoungBoy Never Broke Again
40 Shoreline Mafia
41 Morgan Evans
42 Kelsea Ballerini
43 Wallows
44 T3R Elemento
45 Paris Shadows
46 Trippie Redd
47 N.E.R.D
48 Rihanna
49 Drake
50 A$AP Rocky
51 Moby
52 TWICE
53 Kelsea Ballerini
54 Tinashe
55 XXXTENTACION
56 Goody Grace
57 gnash
58 Mitchell Tenpenny
59 Playboi Carti
60 Young Thug
61 Tyga
62 Lil Skies
63 Forrest.
64 Biskwiq
65 Ryce
66 Luke Combs
67 JID
68 Clean Bandit
69 Zara Larsson
70 Abhi The Nomad
71 Harrison Sands
72 Copper King
73 Choker
74 Metro Boomin
75 Travis Scott
76 Kodak Black
77 21 Savage
78 SHAUN
79 Conor Maynard
80 Sam Feldt
81 Sure Sure
82 Billie Eilish
83 TroyBoi
84 Phil Wickham
85 JPEGMAFIA
86 Aminé
87 Future
88 Juice WRLD
89 AViVA
90 物語シリーズ
91 Luke Combs
92 Marc Anthony
93 Will Smith
94 Bad Bunny
95 5 Seconds of Summer
96 Aminé
97 Alex Rose
98 Casper Magico
99 Nengo Flow
100 Bryant Myers
In [98]:
%sql SELECT * FROM dim_dates LIMIT 100;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[98]:
date_id year month day hour minute second
1 2018 11 1 20 57 10
2 2018 11 7 13 45 11
3 2018 11 14 6 26 17
4 2018 11 14 7 29 17
5 2018 11 14 23 22 14
6 2018 11 19 3 43 28
7 2018 11 1 21 1 46
8 2018 11 5 16 44 24
9 2018 11 7 0 1 16
10 2018 11 7 14 29 36
11 2018 11 8 10 34 27
12 2018 11 12 22 22 22
13 2018 11 14 7 1 5
14 2018 11 14 8 10 20
15 2018 11 1 21 2 12
16 2018 11 2 15 58 28
17 2018 11 2 16 38 59
18 2018 11 8 6 25 57
19 2018 11 8 9 45 43
20 2018 11 14 20 7 53
21 2018 11 1 21 8 16
22 2018 11 8 8 50 49
23 2018 11 14 7 32 16
24 2018 11 18 14 53 26
25 2018 11 19 2 53 22
26 2018 11 1 21 17 33
27 2018 11 2 13 1 57
28 2018 11 2 15 43 16
29 2018 11 2 16 27 56
30 2018 11 7 0 43 0
31 2018 11 7 15 56 48
32 2018 11 13 16 30 47
33 2018 11 13 17 29 55
34 2018 11 14 7 57 38
35 2018 11 19 10 20 0
36 2018 11 20 1 54 59
37 2018 11 20 2 36 47
38 2018 11 1 21 28 54
39 2018 11 3 16 24 53
40 2018 11 7 16 3 48
41 2018 11 14 22 40 13
42 2018 11 18 15 45 13
43 2018 11 20 9 25 53
44 2018 11 1 21 42 0
45 2018 11 2 12 35 41
46 2018 11 2 13 33 3
47 2018 11 2 16 22 13
48 2018 11 3 1 5 50
49 2018 11 10 0 29 18
50 2018 11 10 8 36 7
51 2018 11 14 6 36 13
52 2018 11 15 18 24 32
53 2018 11 15 20 38 46
54 2018 11 18 14 59 38
55 2018 11 18 15 12 45
56 2018 11 18 15 56 27
57 2018 11 1 21 50 15
58 2018 11 2 12 17 4
59 2018 11 2 13 6 37
60 2018 11 2 13 44 33
61 2018 11 4 6 35 26
62 2018 11 4 6 38 37
63 2018 11 14 8 14 44
64 2018 11 14 23 26 53
65 2018 11 15 19 46 21
66 2018 11 15 21 25 45
67 2018 11 20 7 18 5
68 2018 11 1 21 52 5
69 2018 11 3 17 17 26
70 2018 11 7 14 35 56
71 2018 11 13 16 51 11
72 2018 11 14 4 18 14
73 2018 11 14 6 15 56
74 2018 11 14 7 46 3
75 2018 11 14 21 53 46
76 2018 11 20 2 15 5
77 2018 11 20 11 3 59
78 2018 11 1 21 55 25
79 2018 11 2 12 9 30
80 2018 11 2 16 3 53
81 2018 11 15 20 37 12
82 2018 11 17 23 39 51
83 2018 11 18 13 39 14
84 2018 11 20 2 2 0
85 2018 11 1 22 23 14
86 2018 11 2 12 11 32
87 2018 11 3 15 52 36
88 2018 11 5 16 47 50
89 2018 11 7 5 32 6
90 2018 11 10 0 19 28
91 2018 11 14 22 37 59
92 2018 11 20 9 40 32
93 2018 11 20 9 48 40
94 2018 11 20 11 15 14
95 2018 11 20 13 42 19
96 2018 11 2 12 3 7
97 2018 11 3 17 19 54
98 2018 11 6 20 25 43
99 2018 11 7 0 52 29
100 2018 11 7 6 41 58
In [99]:
%sql SELECT * FROM dim_locations LIMIT 100;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[99]:
user_id longitude latitude
1 121.0577507 14.24777986
2 121.0244985 14.39202707
3 121.0513364 14.66512714
4 121.0323808 14.6287284
5 121.0546188 14.65372321
6 123.8736614 10.32727586
7 121.0598459 14.24851989
8 121.0553765 14.2407114
9 123.8694782 10.30912542
10 121.0327252 14.41835138
11 121.0377921 14.2298179
12 121.0364086 14.65605779
13 121.0450169 14.23488394
14 123.8524017 10.33690971
15 121.0709862 14.24565958
16 123.8014537 10.34546434
17 121.0614733 14.25084438
18 121.0350828 14.64738321
19 123.8582127 10.31223214
20 121.0308051 14.65929261
21 121.0125751 14.42874196
22 121.0265459 14.39094848
23 121.0272516 14.66177014
24 121.0639547 14.2394838
25 123.7952983 10.31724588
26 121.0330756 14.64205712
27 121.0732049 14.25466135
28 123.8251368 10.34494141
29 123.8054298 10.31592916
30 120.9887098 14.44612893
31 121.0488945 14.23072357
32 121.0717828 14.25677277
33 123.8028315 10.32714818
34 121.0566622 14.6427324
35 121.063723 14.24150137
36 123.8747406 10.3220519
37 121.0253021 14.38489132
38 123.8645445 10.31884932
39 121.0586424 14.23406803
40 121.0385185 14.39207613
41 121.0554547 14.24165886
42 121.029903 14.4049765
43 123.8328213 10.31070784
44 121.0056655 14.44359233
45 121.0756029 14.25468692
46 121.0593643 14.65222826
47 121.0398971 14.6628413
48 123.8737461 10.33265349
49 121.0474131 14.66397988
50 121.0345234 14.44554349
51 121.0414076 14.64044368
52 123.8708662 10.3218514
53 121.0464644 14.65888188
54 121.0288163 14.43105088
55 121.0158293 14.6647035
56 121.059258 14.6319537
57 121.0503601 14.64953264
58 121.0298833 14.40808912
59 121.0627138 14.23774099
60 121.017674 14.43148611
61 123.83282 10.30009076
62 121.0201485 14.3833865
63 121.0398209 14.43897897
64 121.0035655 14.44066566
65 121.0397764 14.44222001
66 121.0318088 14.62332026
1 121.0577507 14.24777986
2 121.0244985 14.39202707
3 121.0513364 14.66512714
4 121.0323808 14.6287284
5 121.0546188 14.65372321
6 123.8736614 10.32727586
7 121.0598459 14.24851989
8 121.0553765 14.2407114
9 123.8694782 10.30912542
10 121.0327252 14.41835138
11 121.0377921 14.2298179
12 121.0364086 14.65605779
13 121.0450169 14.23488394
14 123.8524017 10.33690971
15 121.0709862 14.24565958
16 123.8014537 10.34546434
17 121.0614733 14.25084438
18 121.0350828 14.64738321
19 123.8582127 10.31223214
20 121.0308051 14.65929261
21 121.0125751 14.42874196
22 121.0265459 14.39094848
23 121.0272516 14.66177014
24 121.0639547 14.2394838
25 123.7952983 10.31724588
26 121.0330756 14.64205712
27 121.0732049 14.25466135
28 123.8251368 10.34494141
29 123.8054298 10.31592916
30 120.9887098 14.44612893
31 121.0488945 14.23072357
32 121.0717828 14.25677277
33 123.8028315 10.32714818
34 121.0566622 14.6427324
In [100]:
%sql SELECT * FROM dim_songs LIMIT 100;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[100]:
song_id title year month day duration_ms acousticness danceability energy explicit instrumentalness key liveness loudness mode popularity speechiness tempo valence
3MoV1UsAJmz64LHqyiRMp0 Say My Name 2018 9 14 198947 0.137 0.703 0.66 0 0.0 10 0.192 -5.119 0 66 0.123 95.031 0.79
5FlmoOiXokZHJvESrFbsyg MAMA 2018 11 27 192405 0.0893 0.761 0.672 1 9.77e-06 1 0.14 -5.887 1 68 0.212 160.077 0.437
5x7Fb9wMOtxd7eZzQ9EzVd Gone Girl 2018 4 27 163453 0.287 0.527 0.868 0 0.0 2 0.0885 -2.653 1 61 0.0416 172.086 0.77
5YeHLHDdQ4nKHk81XFWhCU successful 2018 8 17 227387 0.107 0.847 0.603 1 7.4e-06 0 0.106 -4.607 0 67 0.0397 114.045 0.735
66XDf77gBcfQKi4I2vIbTj Waste It On Me 2018 11 9 192537 0.252 0.669 0.684 0 0.0 7 0.72 -4.595 1 70 0.0585 96.097 0.466
1gewwJHPkvysxLEGgOKQuB Why Not Me 2018 7 1 195000 0.48 0.905 0.721 0 0.0 7 0.528 -6.922 1 64 0.31 145.958 0.695
2CY9aDyI5NQLSuvU2d2VzX All Thee Above (feat. Kevin Gates) 2018 10 12 212400 0.00988 0.736 0.568 1 0.0 1 0.0731 -5.973 1 61 0.135 145.93 0.514
1OKQIvaO7yn7R2BpdUFG7D GET UP 2018 5 4 245787 0.0174 0.552 0.935 0 1.98e-06 0 0.116 -4.096 1 63 0.0623 74.954 0.38
6PoKfDY78ih5OveWirggRX Off White VLONE (Lil Baby & Gunna feat. Lil Durk & NAV) 2018 10 5 187465 0.105 0.859 0.506 1 0.0 10 0.113 -8.115 0 63 0.418 140.973 0.153
27YD36FUikKxbp3bKSiKGZ First Time (feat. Dylan Matthew) 2018 10 12 297782 0.251 0.422 0.677 0 3.55e-06 11 0.16 -5.76 1 63 0.0365 150.131 0.146
7yB6GF5kgPEoeGE4EAWiJU Blue Laces 2 2018 2 16 250920 0.0276 0.569 0.836 1 0.0 4 0.479 -3.831 0 62 0.226 75.701 0.0927
2Xc9MR2NMc5IKPFKaEjt1k Hot Girl 2018 12 21 196282 0.000278 0.761 0.637 1 0.0 10 0.251 -5.432 0 65 0.313 185.904 0.374
29ZZAShKhwRUuKULSNtxX6 Switch 2018 9 14 207071 0.0409 0.66 0.658 1 0.00595 7 0.102 -6.127 1 67 0.0387 176.701 0.153
3uwrczwM9A3xX63jlcRhIC Cleaner Air 2018 10 30 128129 0.876 0.717 0.797 1 0.000694 7 0.0836 -5.393 1 64 0.0368 77.492 0.849
09ts3GnICqYEU5PkQCpJK3 Say Something (feat. Chris Stapleton) 2018 1 25 278893 0.103 0.707 0.632 0 1.09e-05 10 0.0841 -7.031 1 60 0.0789 97.04 0.372
5mrOnabhXnvCf5vFF7NJhG Joan of Arc 2018 6 8 145161 0.00375 0.915 0.668 1 0.00241 1 0.253 -2.52 1 64 0.04 124.053 0.0387
2ulXW6DPrhTDt7ZqqYDdhu Genie 2018 6 8 194792 0.286 0.598 0.605 1 0.0 7 0.0875 -4.92 0 62 0.452 159.517 0.513
2fwkl5Z8ujDtAh6qxEA5x0 Lil Bebe (feat. Lil Baby) - Remix 2018 11 30 174720 0.0481 0.856 0.485 1 0.0 5 0.108 -5.519 0 65 0.0597 136.005 0.0929
41o2ydrj7Xm9Yt5odIBqq4 Velvet Light 2018 4 5 141587 0.75 0.67 0.419 0 0.135 6 0.0976 -10.523 0 63 0.0751 77.919 0.513
6q8Lb50EtqNeeJXXe8mMAH Culpables 2018 9 14 226627 0.279 0.728 0.801 0 4.71e-06 8 0.143 -2.942 0 69 0.0683 130.059 0.416
1uqjLVaSyTkdDvfGAmXrHo A Través Del Vaso 2018 8 31 180120 0.621 0.649 0.529 0 0.0 2 0.0563 -4.895 0 65 0.0398 144.869 0.964
2YVt7k1qZXFhzYDe8FMDW6 No Mentions 2018 9 21 220260 0.27 0.803 0.54 1 0.0 1 0.0819 -6.862 0 61 0.318 139.989 0.345
4k0ZX0KlMBEOlTp5vexIrT Bottle Service 2018 5 4 156837 0.113 0.849 0.582 1 0.0 8 0.152 -6.407 1 62 0.214 100.049 0.516
07sa7qkxpemze5BBE7jufs Dance with Me (feat. Kelsea Ballerini) 2018 10 12 199501 0.481 0.611 0.519 0 4.82e-06 1 0.107 -9.854 1 63 0.035 147.981 0.462
3VzULnaqPKf1deQyZ5W4Ah 1980s Horror Film II 2018 10 11 235272 0.0038 0.593 0.658 0 0.0126 2 0.0804 -4.957 1 65 0.0304 116.021 0.27
0laoKkCyhp5HW3wGejhPV5 En Menos de Un Minuto 2018 11 16 177714 0.554 0.702 0.621 1 0.0 7 0.226 -6.884 1 65 0.0296 114.532 0.581
2J84DvAAI8ph5DYojk3HPt GONE (feat. Trippie Redd) 2018 4 25 190893 0.00107 0.712 0.731 1 0.0 9 0.219 -4.555 1 64 0.0335 118.016 0.876
739vCwA3EpBSkk3uDsI2wB Lemon (feat. Drake) - Drake Remix 2018 3 18 226107 0.000738 0.777 0.657 1 3.34e-06 1 0.0932 -6.434 1 64 0.237 189.992 0.245
1YmF9PvLhIISIANoMLIYGq A$AP Forever (feat. Moby) 2018 4 5 233061 0.224 0.469 0.78 1 0.0 10 0.13 -6.49 0 67 0.122 125.999 0.444
1IX47gefluXmKX4PrTBCRM What is Love 2018 7 9 208240 0.0988 0.574 0.909 0 0.0 8 0.264 -1.572 1 72 0.0655 169.964 0.565
0Sbeihd75SGitymHS8Bqbh I Hate Love Songs 2018 10 26 191187 0.252 0.386 0.446 0 0.0 11 0.18 -7.807 1 62 0.0297 179.995 0.4
7D8aQaRzoi9Qzz5yerVK5b Throw A Fit 2018 7 27 158757 0.00597 0.888 0.586 1 0.0365 0 0.153 -6.745 1 66 0.0604 150.051 0.401
4axYWgXASdsCBHuwInAubT I don’t let go 2018 12 7 121630 0.591 0.797 0.293 1 0.0724 5 0.148 -7.479 1 65 0.08 80.035 0.478
1746Hs7bQJMMYAZvVYM3ig Two Shots (feat. gnash) 2018 3 23 179915 0.0542 0.768 0.408 1 0.0 4 0.106 -8.551 0 61 0.0714 129.96 0.0905
5SO7pvSFiXwU0e1z7C4X7H Alcohol You Later 2018 12 14 186600 0.0259 0.671 0.597 0 0.0 8 0.116 -7.011 1 62 0.0351 100.039 0.422
5O9zs6G6RcB6yP1OKwnwiM Choppa Won't Miss (feat. Young Thug) 2018 5 11 217960 0.0182 0.858 0.661 1 0.0 0 0.115 -6.544 1 63 0.319 163.056 0.611
55GiOwtxNajXVbpjjIzMnD SWISH 2018 7 25 194936 0.0831 0.884 0.541 1 0.0 11 0.118 -5.608 1 70 0.221 105.009 0.274
0chpIFSjgfd1CPrAxM7PWp No Rest 2018 8 13 135535 0.0186 0.655 0.498 1 4.47e-06 9 0.139 -13.147 1 64 0.0479 162.861 0.508
5awvelCGpDQHwgZem0ira9 Lately 2018 11 1 207000 0.213 0.718 0.632 0 1.28e-05 0 0.283 -4.011 1 64 0.145 160.039 0.859
0h31TZMlv0ZLc5yppKngwk Memories Are Made Of 2018 6 1 216667 0.0159 0.694 0.718 0 1.61e-06 2 0.287 -6.188 1 62 0.0322 121.078 0.364
46Lx5epWOQBA3J86ovnxmV Workin Out 2018 11 26 226859 0.384 0.818 0.57 1 0.000102 10 0.19 -8.067 0 64 0.285 126.938 0.561
72gv4zhNvRVdQA0eOenCal Symphony (feat. Zara Larsson) 2018 11 30 212733 0.239 0.715 0.605 0 1.41e-05 0 0.189 -5.128 0 74 0.0428 122.956 0.454
0G2wimhVoDYXbQ6csDxtSf Sex n' Drugs 2018 2 9 217778 0.299 0.831 0.32 0 0.0 7 0.0757 -7.916 1 65 0.143 80.935 0.319
6OpU3cvY19M2hFxTbLKy5L Juno 2018 8 3 231681 0.921 0.442 0.267 1 0.00026 7 0.148 -12.857 1 63 0.0365 120.075 0.116
42k9e2hgl98IdgkXLg4pxs No More (feat. Travis Scott, Kodak Black & 21 Savage) 2018 11 6 265120 0.413 0.636 0.501 1 0.0 11 0.339 -7.26 0 65 0.329 82.309 0.196
1ZLrDPgR7mvuTco3rQK8Pk Way Back Home (feat. Conor Maynard) - Sam Feldt Edit 2018 12 21 192453 0.269 0.737 0.747 0 0.0 10 0.219 -4.818 1 75 0.0323 105.943 0.447
6PuoZT4kgw5DrUEdnQ6e01 This Must Be The Place 2018 1 12 219261 0.349 0.886 0.371 0 0.00138 5 0.0985 -10.542 1 63 0.0465 115.026 0.928
1T3E8z6VMXSsACrBSQHTkN MyBoi - TroyBoi Remix 2018 3 9 211304 0.00626 0.879 0.58 0 0.0433 2 0.0653 -4.797 1 68 0.0883 92.004 0.312
6nVm313QmsPlNllntTart1 Living Hope 2018 8 3 327000 0.0304 0.211 0.558 0 0.0 3 0.0978 -6.232 1 61 0.041 143.452 0.317
6XyxCBp6x3jvtxXvMN5sAA 1539 N. Calvert 2018 1 19 157259 0.278 0.863 0.717 1 0.0826 4 0.0991 -3.143 0 64 0.17 152.05 0.357
6VS7wKwtvL2FvTupYSWZ9e Invincible 2018 12 14 196387 0.141 0.724 0.6 0 0.000218 10 0.104 -4.391 0 65 0.29 86.984 0.466
1hHe7cirScEJWgAS0JFqTL 7 Am Freestyle 2018 10 19 192907 0.0185 0.951 0.401 1 0.0 0 0.149 -8.92 0 63 0.411 129.976 0.539
1jfBIylYmLMklQOpbWT5s0 Blame It On The Kids 2018 11 16 196341 0.364 0.719 0.44 0 0.0 6 0.163 -6.266 0 67 0.166 133.013 0.505
5P8lyudWE7HQxb4ludLbEm Renai Circulation 2018 11 2 255040 0.0524 0.814 0.785 0 0.00111 1 0.111 -5.679 0 68 0.0396 120.009 0.941
3z1QH1O1h15lFROjcECqgi A Long Way 2018 6 1 217280 0.0194 0.483 0.713 0 0.000151 2 0.316 -5.027 1 62 0.0295 161.905 0.594
2wRkBumdItthjYP9XknImg Est√° Rico 2018 9 28 215151 0.266 0.817 0.74 0 0.0 11 0.0992 -4.566 0 69 0.283 130.04 0.674
2vHfabj6nFebekTYODqntl Want You Back 2018 6 15 173082 0.00699 0.618 0.867 1 1.03e-06 0 0.202 -4.915 1 68 0.0439 98.03 0.579
45WkcBtgPS8xulUwEg63X1 TOGETHER 2018 8 15 214707 0.469 0.622 0.375 1 1.28e-05 4 0.185 -8.115 0 63 0.208 132.887 0.357
6xqbec9MHRmW7qtEINAlRx Darte Remix 2018 7 27 512093 0.144 0.81 0.631 1 1.32e-06 4 0.451 -6.541 0 70 0.0938 154.976 0.573
7f4bRMqRKRfUNPW2gbTHSt Save Me From Myself (with NoMBe & Big Gigantic) 2018 10 26 211686 0.0198 0.673 0.676 0 1.26e-05 4 0.428 -4.888 0 62 0.216 82.979 0.391
1LV6zxUFIUyePwsdFHNIYA El de la Gorra 27 2018 4 20 200071 0.781 0.663 0.785 0 2.33e-06 11 0.113 -3.874 0 67 0.0365 107.455 0.84
3FskQrDXcY24ur2fCvz35O Ye 2018 1 26 231618 0.0181 0.503 0.592 1 0.00014 5 0.169 -5.923 0 72 0.444 201.85 0.344
3TgelNfEK1DnYrsHOBxaVJ El De Los Lentes Gucci 2018 7 10 125412 0.37 0.765 0.539 0 0.0 11 0.039 -5.028 0 68 0.0437 99.94 0.965
1D23GEeQ32aqLw5ZnY3Ivp COOLEST MONKEY IN THE JUNGLE 2018 5 11 152213 0.324 0.893 0.676 1 0.0 9 0.0968 -4.867 0 63 0.227 125.923 0.804
4BiiOzZCrXEzHRLYcYFiD5 Hope 2018 12 14 180120 0.488 0.773 0.699 0 3.6e-06 1 0.0814 -5.982 0 71 0.0958 104.941 0.513
6e13443Ve7RGcAUScTgYtl The Last Of The Real Ones 2018 1 19 230400 0.00599 0.576 0.872 0 0.0 8 0.188 -5.293 1 63 0.0471 115.042 0.559
2yI8omCrTAyrTcK7NRppu7 Ocean (feat. Jacquees) 2018 6 22 219437 0.268 0.709 0.264 1 0.0 1 0.24 -17.696 0 62 0.186 133.858 0.341
2lRe5wBRm4xaSKTbDn2vLD Trippy (feat. J. Cole) 2018 11 16 323587 0.0707 0.57 0.65 1 0.0 2 0.339 -8.538 1 65 0.326 96.767 0.174
1j8gLQNfFOI0vs8qcVTWwa ¿Por Qué Cambiaste De Opinión? 2018 7 20 241907 0.446 0.735 0.49 0 3.52e-06 5 0.129 -4.937 1 69 0.0299 135.159 0.89
4n1bdaKwynQndm47x5HqWX Blue Tint 2018 6 29 162800 0.013 0.888 0.372 1 0.0 7 0.184 -10.254 0 64 0.358 133.998 0.123
2u6XJ2PPaCyyRJqC0x6TMH Through the Storm 2018 6 8 163567 0.631 0.881 0.581 1 0.0 5 0.0986 -5.162 1 61 0.0901 78.508 0.876
6euzJRzwBig3NESMppkxOK All I Got 2018 2 16 218704 0.12 0.667 0.509 0 1.91e-06 1 0.269 -6.584 1 64 0.0348 104.995 0.289
4uTFofZ26M2VgCVq3ixUUY Haunt U 2018 2 24 100260 0.393 0.463 0.632 1 0.0 7 0.127 -9.577 1 68 0.0525 163.71 0.0894
4ToogKQq58R9TvO40qRuxi I Wanna Know 2018 7 27 206727 0.501 0.432 0.85 0 6.59e-06 10 0.203 -4.417 1 63 0.0771 82.607 0.109
1KnJwmgLyBfB7u920qLbEq Romance Garbage 2018 7 7 122070 0.152 0.467 0.553 0 0.683 6 0.291 -9.692 0 64 0.1 144.943 0.39
0FbSpY8RPxTRxi1Fpj9Dz9 Booty 2018 2 27 160038 0.000831 0.938 0.549 1 0.000243 1 0.082 -8.368 1 62 0.098 144.005 0.412
08zAFXhfyIxq9359NHksdP Do Not Disturb (feat. Lil Yachty & Offset) 2018 4 13 157557 0.203 0.878 0.647 1 0.00185 10 0.335 -4.566 0 64 0.0659 131.027 0.212
72B1Omo2kmEst4YKO8vOB6 Scars 2018 3 2 238307 0.048 0.417 0.524 0 0.0 10 0.103 -6.6 1 62 0.0326 75.266 0.152
2SPEg45ZaUFJCg9GwNmRvw None Of My Business 2018 10 19 184005 0.428 0.793 0.488 0 0.0 2 0.0823 -5.807 0 68 0.0462 133.957 0.5
3ZsLuJ8uU0lE6rGTkDaqEg Alone Together 2018 6 22 171440 0.00527 0.541 0.87 0 0.0 1 0.0867 -4.741 1 62 0.122 179.975 0.665
2TLDilmv1UmRTfyb7f2Hrx One That Got Away 2018 6 1 218733 0.00303 0.565 0.825 0 0.0 10 0.0779 -4.245 1 61 0.0417 160.003 0.609
4qjLvvBh5ZeKEPyShKRf06 Narcissist (feat. The 1975) 2018 8 28 195502 0.218 0.543 0.682 1 1.53e-06 3 0.085 -5.81 1 67 0.113 179.969 0.424
41a7dZcq30Ss5kPMayWRV0 That’s How You Feel 2018 6 29 157605 0.117 0.85 0.473 1 0.0105 10 0.108 -5.747 0 65 0.209 85.454 0.38
48x8fBWfyeSMzLl5gbsjr9 COME THRU 2018 10 26 153169 0.163 0.66 0.596 1 0.86 0 0.109 -8.207 1 65 0.0358 77.026 0.0757
1FDV80LC5JnTDAEjPDovER Worthy Of You 2018 12 6 190619 0.439 0.451 0.452 0 0.0 1 0.478 -9.363 0 66 0.0574 57.523 0.386
33DzKnwuBE6lfOiADwzd5E TOOTIMETOOTIMETOOTIME 2018 11 30 207851 0.177 0.667 0.824 0 0.0152 8 0.0933 -5.867 1 68 0.0665 114.998 0.518
1x3W8RZxW94lrVGhP95qA6 You Are The Reason - Duet Version 2018 11 30 190760 0.382 0.403 0.356 0 0.0 10 0.148 -7.547 1 72 0.0298 171.454 0.151
3lWzVNe1yFZlkeBBzUuZYu Bella Ciao - M√∫sica Original de la Serie la Casa de Papel/ Money Heist 2018 4 20 139845 0.538 0.46 0.564 0 0.0 9 0.191 -10.329 0 78 0.0343 135.178 0.962
18xmRkPWE0sGUzFyNNAAhN Dying in LA 2018 6 22 229053 0.876 0.346 0.35 0 1.98e-06 2 0.0708 -4.988 1 65 0.0328 126.444 0.315
6DuEfVcF8bmEkiFKr41SiM Prayed For You - Acoustic 2018 11 16 205915 0.63 0.713 0.386 0 0.0 4 0.151 -6.995 1 61 0.0291 139.968 0.52
5sWbwccBcyHsg5LEKWGZo9 BRACKETS 2018 4 20 315771 0.19 0.675 0.567 1 1.77e-06 10 0.175 -9.147 0 63 0.263 84.039 0.658
3BLKY8w5XTehYgLb3cFZgg Deadman's Curve (Live) 2018 6 29 207453 0.731 0.392 0.496 0 0.0 2 0.908 -9.664 1 60 0.203 172.243 0.604
6dJJT9h7FzjenIaa31bw8p Yo Fui El Elegante 2018 2 12 146413 0.296 0.785 0.831 0 0.0 9 0.176 -4.439 1 66 0.193 134.829 0.581
7IOxauZfx5l18mYfX0GZm6 Knees 2018 6 22 206581 0.0781 0.777 0.577 0 1.02e-06 11 0.0869 -6.872 0 68 0.0357 109.019 0.381
0r9d60ZYPAlFHq8qnUtkXN Madura (feat. Bad Bunny) 2018 4 13 190909 0.138 0.878 0.555 0 1.34e-06 1 0.151 -7.639 0 70 0.202 88.029 0.751
645YBsxhFPSOKxGep3uHLv Love Maze 2018 5 18 221529 0.0487 0.636 0.876 0 0.0 9 0.0728 -4.652 1 70 0.0617 149.932 0.759
6Z7m5DLNPOaPg9T6nXK0rQ DNA 2018 8 24 223123 0.0223 0.599 0.774 0 0.0 1 0.0669 -4.014 0 72 0.0541 129.819 0.696
48XfojJC87VKLlUotYV0Gm The Truth Untold 2018 5 18 242293 0.372 0.57 0.422 0 0.0 1 0.127 -7.561 0 70 0.0422 113.007 0.169
46za9pKXWkesH9DCrBbtNU Low 2018 4 28 233963 0.689 0.746 0.438 1 0.103 4 0.126 -7.016 0 64 0.0326 119.991 0.042
50Hv5NZIM0puIUoBttjpfb King of the Clouds 2018 6 22 160747 0.237 0.285 0.922 0 0.0 2 0.159 -1.932 0 64 0.247 198.063 0.472
In [101]:
%sql SELECT * FROM dim_users LIMIT 100;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[101]:
id user_id full_name gender level
0 1 Adelyn Jordan F free
1 2 Agnes Lazo F paid
2 3 Alyssa Fuentebella F free
3 4 Amos Changchoco M paid
4 5 Angelo Santiago M paid
5 6 Anna Ramos F paid
6 7 Argel Alvezo M paid
7 8 Bianca Cirio F paid
8 9 Billy Alegrado M paid
9 10 Bonnie Ruiz F paid
10 11 Chris Soriano M paid
11 12 Cid Ual M free
12 13 Dale Garcia M paid
13 14 Dan Nuesca M free
14 15 Donna Magugat F paid
15 16 Elizabeth Brooks F free
16 17 Emily Benson F paid
17 18 Fred Rara M paid
18 19 Gab Cadena M paid
19 20 Gelo Timajo M paid
20 21 Gil Uy F paid
21 22 Hayden Brock F paid
22 23 HJ Bayaborda M free
23 24 Jaffy Tan M paid
24 25 Jazel Jayme F free
25 26 Jill Aiken F free
26 27 Joaqs Gonzales M free
27 28 John Gacal M free
28 29 Julia Las F free
29 30 K-Ann Carandang F paid
30 31 Karen Reyes F paid
31 32 Ken Inocencio M free
32 33 Kimberly Hohl F paid
33 34 Lauren Frank F free
34 35 Law Guidote M paid
35 36 Lily Cooper F paid
36 37 Magdalene Herman F paid
37 38 Mandy Mapa M paid
38 39 Manu Gaspar M free
39 40 Marc Castro M free
40 41 Margs Baluyut F free
41 42 Mark Acot M paid
42 43 Marvee dela Resma M paid
43 44 Mason Hao M paid
44 45 Mavel Paderes F free
45 46 Miguel Pingol M paid
46 47 Mike Tanjango M free
47 48 Mimay Rodrigo F paid
48 49 Nancy Cruz F paid
49 50 Norman Lapid M paid
50 51 Pao de Guzman M paid
51 52 Paolo Punzalan M paid
52 53 Pat Donato F paid
53 54 Patrick La Rosa M paid
54 55 Patsy Hanlon F paid
55 56 Pauline Guevarra F free
56 57 Sharyl Sy F paid
57 58 Stacey Crosby F paid
58 59 Susan Boyer F paid
59 60 Toby Fernando M paid
60 61 Ven Velante M free
61 62 Vince Rivera M paid
62 63 Viola Conner F paid
63 64 Weddy Diamada M paid
64 65 Xidy del Rosario F paid
65 66 Zhoya Ginez F free
0 1 Adelyn Jordan F free
1 2 Agnes Lazo F paid
2 3 Alyssa Fuentebella F free
3 4 Amos Changchoco M paid
4 5 Angelo Santiago M paid
5 6 Anna Ramos F paid
6 7 Argel Alvezo M paid
7 8 Bianca Cirio F paid
8 9 Billy Alegrado M paid
9 10 Bonnie Ruiz F paid
10 11 Chris Soriano M paid
11 12 Cid Ual M free
12 13 Dale Garcia M paid
13 14 Dan Nuesca M free
14 15 Donna Magugat F paid
15 16 Elizabeth Brooks F free
16 17 Emily Benson F paid
17 18 Fred Rara M paid
18 19 Gab Cadena M paid
19 20 Gelo Timajo M paid
20 21 Gil Uy F paid
21 22 Hayden Brock F paid
22 23 HJ Bayaborda M free
23 24 Jaffy Tan M paid
24 25 Jazel Jayme F free
25 26 Jill Aiken F free
26 27 Joaqs Gonzales M free
27 28 John Gacal M free
28 29 Julia Las F free
29 30 K-Ann Carandang F paid
30 31 Karen Reyes F paid
31 32 Ken Inocencio M free
32 33 Kimberly Hohl F paid
33 34 Lauren Frank F free
In [102]:
%sql SELECT * FROM fact_listens LIMIT 100;
   postgresql://love_keeper:***@finalproj-db.cydd5y33n4ue.us-east-1.rds.amazonaws.com/love_database
 * postgresql://songmanager:***@new-cluster.cjptwjq0k7m8.us-east-1.redshift.amazonaws.com:5439/songsdb
100 rows affected.
Out[102]:
user_id song_id title date_id
26 2UVbBKQOdFAekPTRsnkzcf Stir Fry 1
50 2UVbBKQOdFAekPTRsnkzcf Stir Fry 2
24 2UVbBKQOdFAekPTRsnkzcf Stir Fry 3
15 2UVbBKQOdFAekPTRsnkzcf Stir Fry 4
52 2UVbBKQOdFAekPTRsnkzcf Stir Fry 5
11 2UVbBKQOdFAekPTRsnkzcf Stir Fry 6
23 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 7
13 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 8
40 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 9
10 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 10
13 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 11
36 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 12
27 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 13
43 1Jl21sidRpEdlVBvYbssLx Sin (feat. Jaden Smith) 14
23 20dkebBtPR4AJ8vVos41Gw Rock Salt and Nails (Live) 15
27 20dkebBtPR4AJ8vVos41Gw Rock Salt and Nails (Live) 16
32 20dkebBtPR4AJ8vVos41Gw Rock Salt and Nails (Live) 17
58 20dkebBtPR4AJ8vVos41Gw Rock Salt and Nails (Live) 18
1 20dkebBtPR4AJ8vVos41Gw Rock Salt and Nails (Live) 19
48 20dkebBtPR4AJ8vVos41Gw Rock Salt and Nails (Live) 20
23 6t94PGXYAuo9GbJEnMpPcW La Forma en Que Me Miras 21
58 6t94PGXYAuo9GbJEnMpPcW La Forma en Que Me Miras 22
15 6t94PGXYAuo9GbJEnMpPcW La Forma en Que Me Miras 23
12 6t94PGXYAuo9GbJEnMpPcW La Forma en Que Me Miras 24
5 6t94PGXYAuo9GbJEnMpPcW La Forma en Que Me Miras 25
47 4TwVtW8hS5LyLoDtJGpUOg History 26
57 4TwVtW8hS5LyLoDtJGpUOg History 27
32 4TwVtW8hS5LyLoDtJGpUOg History 28
14 4TwVtW8hS5LyLoDtJGpUOg History 29
54 4TwVtW8hS5LyLoDtJGpUOg History 30
25 4TwVtW8hS5LyLoDtJGpUOg History 31
38 4TwVtW8hS5LyLoDtJGpUOg History 32
15 4TwVtW8hS5LyLoDtJGpUOg History 33
41 4TwVtW8hS5LyLoDtJGpUOg History 34
23 4TwVtW8hS5LyLoDtJGpUOg History 35
22 4TwVtW8hS5LyLoDtJGpUOg History 36
40 4TwVtW8hS5LyLoDtJGpUOg History 37
23 6fS1CEMY4LlvQNWuUMoWEQ Mockingbird 38
31 6fS1CEMY4LlvQNWuUMoWEQ Mockingbird 39
48 6fS1CEMY4LlvQNWuUMoWEQ Mockingbird 40
9 6fS1CEMY4LlvQNWuUMoWEQ Mockingbird 41
25 6fS1CEMY4LlvQNWuUMoWEQ Mockingbird 42
24 6fS1CEMY4LlvQNWuUMoWEQ Mockingbird 43
26 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 44
2 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 45
42 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 46
29 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 47
34 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 48
66 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 49
18 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 50
37 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 51
15 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 52
11 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 53
16 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 54
35 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 55
52 4ve2uzqdwnHr20G5YgMMqr Missin You Crazy 56
47 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 57
57 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 58
35 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 59
1 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 60
36 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 61
9 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 62
29 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 63
52 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 64
44 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 65
15 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 66
39 0WVAQaxrT0wsGEG4BCVSn2 Born To Be Yours 67
47 5qHirGR7M9tdm6C17DlzSY REEL IT IN 68
33 5qHirGR7M9tdm6C17DlzSY REEL IT IN 69
2 5qHirGR7M9tdm6C17DlzSY REEL IT IN 70
39 5qHirGR7M9tdm6C17DlzSY REEL IT IN 71
42 5qHirGR7M9tdm6C17DlzSY REEL IT IN 72
24 5qHirGR7M9tdm6C17DlzSY REEL IT IN 73
27 5qHirGR7M9tdm6C17DlzSY REEL IT IN 74
36 5qHirGR7M9tdm6C17DlzSY REEL IT IN 75
30 5qHirGR7M9tdm6C17DlzSY REEL IT IN 76
38 5qHirGR7M9tdm6C17DlzSY REEL IT IN 77
47 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 78
45 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 79
27 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 80
11 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 81
21 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 82
51 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 83
42 5N5k9nd479b1xpDZ4usjrg Promises (with Sam Smith) 84
47 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 85
60 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 86
54 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 87
62 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 88
59 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 89
64 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 90
9 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 91
36 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 92
19 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 93
19 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 94
56 6hkQ6OQ6nhe7QCckH91aGa Victory Lap (feat. Stacy Barthe) 95
35 0Y9D1Bc4fAkeZnVsGKdij6 WANTED U 96
33 0Y9D1Bc4fAkeZnVsGKdij6 WANTED U 97
6 0Y9D1Bc4fAkeZnVsGKdij6 WANTED U 98
40 0Y9D1Bc4fAkeZnVsGKdij6 WANTED U 99
16 0Y9D1Bc4fAkeZnVsGKdij6 WANTED U 100

Screen%20Shot%202022-06-25%20at%204.32.33%20PM.png

Screen%20Shot%202022-06-25%20at%204.32.58%20PM.png

Screen%20Shot%202022-06-25%20at%204.33.20%20PM.png

Screen%20Shot%202022-06-25%20at%204.33.43%20PM.png

Screen%20Shot%202022-06-25%20at%204.33.59%20PM.png

Screen%20Shot%202022-06-25%20at%204.34.20%20PM.png

Code for App¶

In [ ]:
import pandas as pd
import tkinter
import customtkinter
import os
from tkinter import *
from PIL import ImageTk, Image

loc_df = pd.read_csv('https://9kuki1mnrb.execute-api.us-east-1.amazonaws.com/deploy/recommender/loc_match.csv')
song_df = pd.read_csv('https://9kuki1mnrb.execute-api.us-east-1.amazonaws.com/deploy/recommender/song_match.csv')
user_df = pd.read_csv('https://9kuki1mnrb.execute-api.us-east-1.amazonaws.com/deploy/recommender/user_map.csv')

user_df['userid'] = user_df['userid'].astype(str)
user_map = pd.Series(user_df.userfullname.values, index=user_df.userid).to_dict()

loc_df.columns = loc_df.columns.astype(str)
loc_df.index = loc_df.index.astype(str)
loc_df['userID'] = loc_df['userID'].astype(str)
loc_df = loc_df.rename(columns=user_map)
loc_df['userID'] = loc_df['userID'].apply(lambda x: user_map[x])

new_cols = song_df['userID'].tolist()
new_cols.insert(0, 'userID')
song_df.columns = new_cols
song_df.columns = song_df.columns.astype(str)
song_df.index = song_df.index.astype(str)
song_df['userID'] = song_df['userID'].astype(str)
song_df = song_df.rename(columns=user_map)
song_df['userID'] = song_df['userID'].apply(lambda x: user_map[x])

def matches():
    user_id = entry.get()
    max_dist = slider.get()
    
    loc_match = loc_df[loc_df['userID'] == user_id].drop(['userID', user_id], axis=1).T
    loc_match.columns = ['matches']
    loc_match = list(loc_match[loc_match['matches'] < max_dist].index)
    song_score = song_df[song_df['userID'].isin(loc_match)][user_id]
    
    match_dict = {k:v for k, v in zip(loc_match, song_score)}
    sorted_matches = sorted(match_dict, key=match_dict.get, reverse=True)
    
    output = (str('You Found:')+'\n'+'\n'+
            str(sorted_matches[0])+'\n'+'\n'+
              (sorted_matches[1])+'\n'+'\n'+
              str(sorted_matches[2]))
    
    label.config(text=output, width=100, height=150)
    
customtkinter.set_appearance_mode("dark")  # Modes: "System" (standard), "Dark", "Light"
customtkinter.set_default_color_theme("blue")  # Themes: "blue" (standard), "green", "dark-blue"

root_tk = customtkinter.CTk()
root_tk.geometry(f"{300}x{500}")
root_tk.title("Tindify")
    

def get_dist(value):
    label2.config(text=f"Looking {int(value)}km away")

img = ImageTk.PhotoImage(Image.open("logo.png"))
panel = Label(root_tk, image = img)
panel.pack(side = "top", fill = "both", expand = "yes")

entry = customtkinter.CTkEntry(master=root_tk,
                               placeholder_text="What's your name?",
                               width=150,
                               height=25)
entry.place(relx=0.5, rely=0.73, anchor=tkinter.CENTER)

label2=customtkinter.CTkLabel(root_tk, text='',bg_color='#424242')
label2.place(relx=0.5, rely=0.80,anchor=tkinter.CENTER)

slider = customtkinter.CTkSlider(master=root_tk, 
                                 from_=0, to=50, 
                                 command=get_dist, 
                                 width= 150, button_color='#E84B76')
slider.place(relx=0.5, rely=0.87, anchor=tkinter.CENTER)

button = customtkinter.CTkButton(root_tk, text="Find Love", command=matches, fg_color='#E84B76')
button.place(relx=0.5, rely=0.95, anchor=tkinter.CENTER)

label=customtkinter.CTkLabel(root_tk, text='',bg_color='#424242', width=100, height=100)
label.place(relx=0.5, rely=0.55,anchor=tkinter.CENTER)



root_tk.mainloop()
In [ ]: